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CLASSIFICATION AND PROGNOSIS PREDICTION OF ACUTE 
LYMPHOBLASTIC LEUKEMIA BY GENE EXPRESSION PROFILING 



FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT 
This research underlying this invention was supported in part with funds from 
National Institutes of Health grants P01 CA71907-06, CA51001, CA36401, 
CA78224, Cancer Center CORE Grant CA-21765, and National Science Foundation 
grant EIA-0074869. The United States Government may have an interest in the 
subject matter of the invention. 

BACKGROUND OF THE INVENTION 
Pediatric acute lymphoblastic leukemia (ALL) is one of the great success 
stories of modern cancer therapy, with contemporary treatment protocols achieving 
overall long-term event free survival rates approaching 80% (Schrappe et al. (2000) 
Blood 95:3310-22; Silverman et a/.(2001) Blood 97:1211-18; and Pui and Evans 
(1998) N. Eng. J. Med. 339:605-15). This success has been achieved in part by using 
risk-adapted therapy that involves tailoring the intensity of treatment to each patient's 
risk of relapse. This approach was developed following the realization that pediatric 
ALL is a heterogeneous disease consisting of various leukemia subtypes that differ 
markedly in their response to chemotherapy (reviewed in Pui and Evans (1998) N. 
Eng. J. Med. 339:605-15). By tailoring the intensity of treatment to a patient's 
relative risk of relapse, patients are neither under-treated or over-treated, and are thus 
afforded the highest chance for a cure. 

Critical to the success of this approach has been the accurate assignment of 
individual patients to specific risk groups. Although risk assignment is influenced by 
a variety of clinical and laboratory parameters, the genetic alterations that underlie the 
pathogenesis of individual leukemia subtypes figure prominently in most 
classification schemes (Silverman LB et al. (2001) Blood 97: 121 1-1 8; and Pui and 
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Evans (1998) N. Engl. J. Med. 339:605-15). Through systematic immunophenotyping 
and cytogenetic analysis, and the subsequent molecular cloning of the genes targeted 
by the identified chromosomal rearrangements, a number of genetically distinct 
leukemia subtypes have been defined. These include B-lineage leukemias that 
contain t(9;22)[BCR-ABL], t(l;19)[E2A-PBXl], t(12;21)[TEL-AMLl], 
rearrangements in the MLL gene on chromosome 1 1, band q23, or a hyperdiploid 
karyotype (i.e., >50 chromosomes), and T-lineage leukemias (T-ALL) (Silverman et 
a/.(2001) Blood 97:121 1-18; and Pui and Evans (1998) N. Eng. J. Med. 339:605-15). 
The underlying genetic lesions in these leukemia subtypes influence the response to 
cytotoxic drugs. For example, leukemias that express the E2A-PBX1 fusion protein 
respond poorly to conventional antimetabohte-based treatment, but have cure rates 
approaching 80% when treated with more intensive therapies (Raimondi et al. (1990) 
J. Clin. Oncol. 8:1380-88; and Hunger (1996) B hod 87:1211-1224). Similarly, BCR- 
ABL expressing ALLs, or infants with MLL rearrangements have exceedingly poor 
cure rates with conventional chemotherapy, and allogeneic hematopoietic stem cell 
transplantation with HLA matched sibling donor has already been shown to improve 
outcome for patients with the former leukemia subtype (Pui et al. (1991) Blood 
77:440-46; Heerema et al. (1999) Leukemia 13:679-86; Arico et al. (2000) N. Engl. J. 
Med. 342:998-1006; and Biondie* al. (2000) Blood 96:24-33). 

Unfortunately, the accurate assignment of patients to specific risk groups is a 
difficult and expensive process, requiring intensive laboratory studies including 
immunophenotyping, cytogenetics, and molecular diagnostics (Pui and Evans (1998; 
N. Eng. J. Med. 339:605-15; and Pui et al. (2001) Lancet Oncology 2:597-607). 
Moreover, these diagnostic approaches require the collective expertise of a number of 
professionals, and although this expertise is available at most major medical centers, it 
is generally unavailable in developing countries. Accordingly, there remains a need 
for rapid, less expensive methods of assigning patients affected by ALL into known 
leukemia risk groups and identifying patients for whom there is a high risk that 
conventional therapeutic approaches will fail. 

BRIEF SUMMARY OF THE INVENTION 
The present invention provides methods and compositions useful for 
diagnosing and choosing treatment for subjects affected by leukemia. The claimed 
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methods include methods of assigning a subject affected by leukemia to a leukemia 
risk group, methods of predicting whether a subject affected by leukemia has an 
increased risk of relapse, methods of predicting whether a subject affected by 
leukemia has an increased risk of developing secondary acute myeloid leukemia 
(AML), methods to aid in the determination of a prognosis for a subject affected by 
leukemia, methods of choosing a therapy for a subject affected by leukemia, and 
methods of monitoring the disease state in a subject undergoing one or more therapies 
for leukemia. Methods of screening test compounds to identify therapeutic 
compounds useful for the treatment of leukemia and molecular targets for these 
therapeutic compounds are also provided. 

The claimed methods comprise providing an expression profile of a sample 
from a subject affected by leukemia and comparing this subject expression profile to 
one or more reference expression profiles. In one embodiment, the reference profiles 
are associated with leukemia risk groups, and the subject expression profile is 
compared to one or more of these risk group reference profiles to thereby assign the 
subject affected by leukemia to a leukemia risk group. In another embodiment, one or 
more reference profiles are associated with relapse of leukemia and the subject 
expression profile is compared to one or more of these relapse reference profiles to 
determine if the subject has an increased risk of relapse. In yet another embodiment, 
one or more reference profiles are associated with secondary AML, and the subject 
expression profile is compared to one or more of these reference profiles to determine 
whether the subject has an increased risk of developing secondary AML. 

Tire present invention also provides compositions useful for diagnosing and 
choosing a therapy for subjects affected by leukemia. These compositions include 
arrays comprising a plurality of capture probes that can bind specifically to nucleic 
acid molecules that are differentially expressed in leukemia risk groups, in leukemia 
subjects who have relapsed, or in leukemia subjects who have developed secondary 
AML. Also provided is a computer-readable medium comprising digitally-encoded 
expression profiles comprising values representing the expression levels of genes that 
are differentially expressed in leukemia risk groups, in leukemia subjects who have 
relapsed, or in leukemia subjects who have developed secondary AML. Additional 
compositions of the invention include kits comprising an array of capture probes that 
can bind specifically to nucleic acid molecules that are differentially expressed in 
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leukemia risk groups, in leukemia subjects who have relapsed, or in leukemia subjects 
who have developed secondary AML, and a computer-readable medium having 
digitally encoded expression profiles with values representing the expression level of 
a nucleic acid molecule detected by the array. 

DETAILED DESCRIPTION OF THE INVENTION 
The present invention provides a single platform, expression analysis, that can 
accurately identify each of the known prognostically and therapeutically relevant 
subgroups of leukemia and predict the risk of relapse and the risk of secondary 
(therapy-induced) AML in patients having leukemia. The methods and compositions 
of the invention provide tools useful in choosing a therapy for leukemia patients, 
including methods for assigning a leukemia patient to a leukemia risk group, methods 
of predicting whether a leukemia patient has an increased risk of relapse, methods of 
predicting whether a leukemia patient has an increased risk of developing secondary 
(therapy-induced) AML, methods of choosing a therapy for a leukemia patient, 
methods of determining the efficacy of a therapy in a leukemia patient, and methods 
of determining the prognosis for a leukemia patient. 

The methods of the invention comprise the steps of providing an expression 
profile from a sample from a subject affected by leukemia and comparing this subject 
expression profile to one or more reference profiles that are associated with a 
particular physiologic condition, such as a leukemia risk group, the occurrence of 
relapse, or the development of secondary AML. By identifying the leukemia risk 
group reference profile that is most similar to the subject expression profile, the 
subject can be assigned to a leukemia risk group. Similarly, the risk that a subject 
affected by leukemia will relapse or develop secondary AML can be predicted by 
determining whether the expression profile from the subject is sufficiently similar to a 
reference profile associated with relapse or a reference profile associated with the 
development of secondary AML. 

In another embodiment, the subject expression profile is from a subject affected by 
leukemia who is undergoing a therapy to treat the leukemia. The subject expression 
profile is compared to one or more reference expression profiles of the invention to 
monitor the efficacy of the therapy. 
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Expression Profiles 

As used herein, an "expression profile" comprises one or more values 
corresponding to a measurement of the relative abundance of a gene expression 
product. Such values may include measurements of RNA levels or protein 
abundance. Thus, the expression profile can comprise values representing the 
measurement of the transcriptional state or the translational state of the gene. See, 
U.S. Pat. Nos. 6,040,138, 5,800,992, 6,020135, 6,344,316, and 6,033,860, which are 
hereby incorporated by reference in their entireties. 

The transcriptional state of a sample includes the identities and relative 
abundance of the RNA species, especially mRNAs present in the sample. Preferably, 
a substantial fraction of all constituent RNA species in the sample are measured, but 
at least a sufficient fraction to characterize the transcriptional state of the sample is 
measured. The transcriptional state can be conveniently determined by measuring 
transcript abundance by any of several existing gene expression technologies. 

Translational state includes the identities and relative abundance of the 
constituent protein species in the sample. As is known to those of skill in the art, the 
transcriptional state and translational state are related. 

In some embodiments, the expression profiles of the present invention are 
generated from samples from subjects affected by leukemia, including subjects having 
leukemia, subjects suspected of having leukemia, subjects having a propensity to 
develop leukemia, or subjects who have previously had leukemia, or subjects 
undergoing therapy for leukemia. The samples from the subject used to generate the 
expression profiles of the present invention can be derived from a variety of sources 
including, but not limited to, single cells, a collection of cells, tissue, cell culture, 
bone marrow, blood, or other bodily fluids. The tissue or cell source may include a 
tissue biopsy sample, a cell sorted population, cell culture, or a single cell. Sources 
for the sample of the present invention include cells from peripheral blood or bone 
marrow, such as blast cells from peripheral blood or bone marrow. 

In selecting a sample, the percentage of the sample that constitutes cells 
having differential gene expression in leukemia risk groups, relapse, or secondary 
AML should be considered. Samples may comprise at least 20%, at least 30%, at 
least 40%, at least 50%, at least 55%, at least 60%, at least 70%, at least 75%, at least 
80%, at least 85%, at least 90%, or at least 95% cells having differential expression in 
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leukemia risk groups, relapse, or secondary AML, with a preference for samples 
having a higher percentage of such cells. In some embodiments, these cells are blast 
cells, such as leukemic cells. The percentage of a sample that constitutes blast cells 
may be determined by methods well known in the art; see, for example, the methods 
described elsewhere herein. 

In some embodiments of the present invention, the expression profiles 
comprise values representing the expression levels of genes that are differentially 
expressed in leukemia risk groups, in subjects affected by leukemia who have 
relapsed, or in subjects affected by leukemia who have developed secondary AML. 
The term "differentially expressed" as used herein means that the measurement of a 
cellular constituent varies in two or more samples. The cellular constituent may be 
upregulated in a sample from a subject having one physiologic condition in 
comparison with a sample from a subject having a different physiologic condition, or 
down regulated in a sample from a subject having one physiologic condition in 
comparison with a sample from a subj ect having a different physiologic condition. 
For example, in one embodiment, the differentially expressed genes of the present 
invention may be expressed at different levels in different leukemia risk groups. In 
another embodiment, the differentially expressed genes are expressed in different 
levels in subjects affected by leukemia who will relapse after conventional treatment 
in comparison with subjects affected by leukemia who will not relapse and thus will 
remain in continuous complete remission. In yet another embodiment, the 
differentially expressed genes are expressed in different levels in subjects affected by 
leukemia who will develop secondary AML in comparison with subjects affected by 
leukemia who will not develop secondary AML. 

The present invention provides groups of genes that are differentially 
expressed in diagnostic leukemia samples of patients in different risk groups, or in 
patients that go on to develop a relapse or a therapy induced (secondary) AML. Some 
of these genes were identified based on gene expression levels for 12,600 probes in 
360 leukemia samples. Values representing the expression levels of the nucleic acid 
molecules detected by the probes were analyzed using five different statistical metrics 
to identify genes that were differentially expressed in leukemia risk groups. The 
methods used to analyze the expression level values to identify differentially 
expressed genes were the Chi-square statistics method, the Correlation-based Feature 
-6- 
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Selection method, the T-statistics method, the Wilkins' method, and the self- 
organizing map and discriminant analysis with variance metric. Although different 
methods of analysis resulted in the selection of different groups of differentially 
expressed genes, the genes selected hy each method could be used to create an 
expression profile that could accurately determine whether a leukemia patient should 
be assigned to a risk group, with an overall diagnostic accuracy of about 96%. See, 
the Experimental section. 

Additional genes that are differentially expressed in diagnostic leukemia 
samples were identified based on gene expression levels for 26,825 probes in a subset 
of 132 leukemia samples selected from the 360 leukemia samples described above. A 
chi-squared metric followed by permutation test was used to identify discriminating 
genes for the T-ALL, E2A-PBX1, TEL-AML1, BCR-ABL, MLL rearrangement, and 
Hyperdiploid>50 chromosomes. Genes whose expression is limited to a single B-cell 
lineage were also identified, and are provided in Tables 70-74. 

Thus, distinct sets of differentially expressed genes that can be used to 
distinguish the T-lineage, hyperdiploid >50 chromosomes, BCR-ABL, E2A-PBX1, 
TEL-AML1, and MLL gene rearrangement risk groups are provided. Examples of 
genes that are differentially expressed in the T-ALL risk group are shown in Tables 7, 
14, 21, 28, 35, 59, and 67. Examples of genes that are differentially expressed in the 
E2A-PBX1 risk group are shown in Tables 3, 10, 17, 24, 31, 55, 64, and 71. 
Examples of genes that are differentially expressed in the TEL-AML1 risk group are 
shown in Tables 8, 15, 22, 29, 36, 60, 68, and 74. Examples of genes that are 
differentially expressed in the BCR-ABL risk group are shown in Tables 2, 9, 16, 23, 
30, 54, 63, and 70. Examples of genes that are differentially expressed in the MLL 
risk group are shown in Tables 5, 12, 19, 26, 33, 57, 66, and 73. Examples of genes 
that are differentially expressed in the Hyperdiploid >50 risk group are shown in 
Tables 4, 11, 18, 25, 32, 56, 65, and 72. 

The present invention further provides a seventh leukemia risk group, herein 
termed "Novel," that can be distinguished from the previously-described leukemia 
risk groups based on expression profiling. The expression profiles from subjects in 
the Novel risk group are distinguishable from those of the T-ALL, E2A-PBX1, TEL- 
AML1, BCR-ABL, MLL, and Hyperdiploid >50 risk groups. Subjects assigned to the 
Novel risk group have similar expression profiles. Examples of genes that are 
-7- 
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differentially expressed in the Novel leukemia risk group are shown in Tables 4, 1 1 , 
18, 25,32, and 58. 

Similarly, sets of differentially expressed genes associated with leukemia 
patients in the T-ALL, Hyperdiploid >50, TEL-AML1, MIX, and Other {i.e. not the 
T-ALL, hyperdiploid >50, TEL-AML1, MLL, E2A-PBX1, or BCR-ABL) risk groups 
who have undergone relapse were identified. Examples of differentially expressed 
genes associated with relapse in subjects in the T-ALL risk group are shown in Table 
44. Examples of differentially expressed genes associated with relapse in subjects in 
the hyperdiploid >50 risk group are shown in Table 45. Examples of differentially 
expressed genes associated with relapse in subjects in the TEL-AML1 risk group are 
shown in Table 46. Examples of differentially expressed genes associated with relapse 
in subjects in the MLL risk group are shown in Table 47. Examples of differentially 
expressed genes associated with relapse in subjects in the E2A-PBX1, BCR-ABL, and 
Novel risk group are shown in Table 48. 

The invention also provides genes that are differentially expressed in subjects 
affected by TEL-AML1 who have developed secondary (treatment-induced) AML. 
Examples of such genes are shown in Table 52. 

The present invention also reveals genes with a high differential level of 
expression in leukemic compared to normal cells. These highly differentially 
expressed genes are selected from the genes shown in Tables 2-36 and 44-48, 63-68, 
and 70-74. These genes and their expression products are useful as markers to detect 
the presence of minimal residual disease (MRD) in a patient. Antibodies or other 
reagents or tools may be used to detect the presence of these telltale markers of MRD. 

The expression profiles of the invention comprise one or more values 
representing the expression level of a gene having differential expression in a 
leukemia risk group, in subjects affected by leukemia who will relapse after 
conventional therapy, or in subjects affected by leukemia who will develop secondary 
AML after conventional therapy. Each expression profile contains a sufficient 
number of values such that the profile can be used to distinguish one leukemia risk 
group from another, or to distinguish subjects who will relapse after conventional 
therapy from those who will not relapse, or to distinguish subjects who will develop 
secondary AML after conventional therapy from those who will not develop 
secondary AML. In some embodiments, the expression profiles comprise only one 
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value. For example, it can be determined whether a subject affected by leukemia is in 
the T-ALL risk group based only on the expression level of the CD3D antigen (NCBI 
Accession No. AA919102; see Table 14). Similarly, it can be determined whether a 
subject affected by leukemia is in the E2A-PBX1 risk group based only on the 
expression level of the cDNA of NCBI Accession No. AL049381 (see Table 10). In 
other embodiments, the expression profile comprises more than one value 
corresponding to a differentially expressed gene, for example at least 2 values, at least 
3 values, at least 4 values, at least 5 values, at least 6 values, at least 7 values, at least 
8 values, at least 9 values, at least 10 values, at least 11 values, at least 12 values, at 
least 13 values, at least 14 values, at least 15 values, at least 16 values, at least 17 
values, at least 18 values, at least 19 values, at least 20 values, at least 22 values, at 
least 25 values, at least 27 values, at least 30 values, at least 35 values , at least 40 
values, at least 45 values, at least 50 values, at least 75 values, at least 100 values, at 
least 125 values, at least 150 values, at least 175 values, at least 200 values, at least 
250 values, at least 300 values, at least 400 values, at least 500 values, at least 600 
values, at least 700 values, at least 800 values, at least 900 values, at least 1000 
values, at least 1200 values, at least 1500 values, or at least 2000 or more values. 

It is recognized that the diagnostic accuracy of assigning a subject to a 
leukemia risk group, deteraiining whether a subject has an increased risk for relapse, 
or determining whether a subject has an increased risk of developing secondary AML 
will vary based on the number of values contained in the expression profile. 
Generally, the number of values contained in the expression profile is selected such 
that the diagnostic accuracy is at least 85%, at least 87%, at least 90%, at least 91%, at 
least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 
98%, or at least 99%, as calculated using methods described elsewhere herein, with an 
obvious preference for higher percentages of diagnostic accuracy. 

It is recognized that the diagnostic accuracy of assigning a subj ect to a 
leukemia risk group, determining whether a subject has an increased risk for relapse, 
or determining whether a subject has an increased risk of developing secondary AML 
will vary based on the strength of the correlation between the expression levels of the 
differentially expressed genes and the associated physiologic condition. When the 
values in the expression profiles represent the expression levels of genes whose 
expression is strongly correlated with the physiologic condition, it may be possible to 
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use fewer number of values in the expression profile and still obtain an acceptable 
level of diagnostic or prognostic accuracy. 

The strength of the correlation between the expression level of a differentially 
expressed gene and the presence or absence of a particular physiologic state may be 
determined by a statistical test of significance. For example, the chi square test used 
to select genes in some embodiments of the present invention assigns a chi square 
value to each differentially expressed gene, indicating the strength of the correlation 
of the expression of that gene and the presence or absence of the associated 
physiologic condition. Similarly, the T-statistics metric and the Wilkins' metric both 
provide a value or score indicative of the strength of the correlation between the 
expression of the gene and the absence or presence of the associated physiologic 
conditions. These scores may be used to select the genes whose expression levels 
have the greatest correlation with a particular physiologic state in order to increase the 
diagnostic or prognostic accuracy of the methods of the invention, or in order to 
reduce the number of values contained in the expression profile while maintaining the 
diagnostic or prognostic accuracy of the expression profile. 

For example, in one embodiment the chi square test is used to determine the 
significance of the differentially expressed genes whose expression levels are 
included in the array, and only those genes having a chi square value of more than 20, 
more than 25, more than 30, more than 35, more than 40, more than 45, more than 50, 
more than 55, more than 60, more than 65, more than 70, more than 75, more than 80, 
more than 90, more than 100, more than 120, more than 140, more than 160, more 
than 180, or more than 200 are selected. 

hi another embodiment, the T-statistics metric is used to determine the 
significance of the differentially expressed genes whose expression levels are 
included in the array, and only those genes with a score having an absolute value of 
greater than 4, greater than 5, greater than 6, greater than 7, greater than 8, greater 
than 9, greater than 10, greater than 12, greater than 25, greater than 27, greater than 
30, or greater than 35 are selected. 

In yet another embodiment, the Wilkins' metric is used to determine the 
significance of the differentially expressed genes whose expression levels are 
included in the array, and only those genes having a score of greater than 0.55, greater 
than 0.57, greater than 0.59, greater than 0.61, greater than 0.63, greater than 0.65, 
-10- 
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greater than 0.67, greater than 0.69, greater than 0.71, greater than 0.73, greater than 
0.75, greater than 0.77, greater than 0.79, greater than 0.81, greater than 0.83, or 
greater than 0.85 are selected. 

Each value in the expression profiles of the invention is a measurement i 
representing the absolute or the relative expression level of a differentially expressed 
genes. The expression levels of these genes may be determined by any method 
known in the art for assessing the expression level of an RNA or protein molecule in a 
sample. For example, expression levels of RNA may be monitored using a membrane 
blot (such as used in hybridization analysis such as Northern, Southern, dot, and the 
like), ormicrowells, sample tubes, gels, beads or fibers (or any solid support 
comprising bound nucleic acids). See U.S. Patent Nos. 5,770,722, 5,874,219, 
5,744,305, 5,677,195 and 5,445,934, which are expressly incorporated herein by 
reference. The gene expression monitoring system may also comprise nucleic acid 
probes in solution. 

In one embodiment of the invention, microarrays are used to measure the 
values to be included in the expression profiles. Microarrays are particularly well 
suited for this purpose because of the reproducibility between different experiments. 
DNA microarrays provide one method for the simultaneous measurement of the 
expression levels of large numbers of genes. Each array consists of a reproducible 
pattern of capture probes attached to a solid support. Labeled RNA or DNA is 
hybridized to complementary probes on the array and then detected by laser scanning. 
Hybridization intensities for each probe on the array are determined and converted to 
a quantitative value representing relative gene expression levels. See, the 
Experimental section. See also, U.S. Pat. Nos. 6,040,138, 5,800,992 and 6,020,135, 
6,033,860, and 6,344,31 6, which are incorporated herein by reference. High-density 
oligonucleotide arrays are particularly useful for determining the gene expression 
profile for a large number of RNA's in a sample. 

hi one approach, total mRNA isolated from the sample is converted to labeled 
cRNA and then hybridized to an oligonucleotide array. Each sample is hybridized to 
) a separate array. Relative transcript levels are calculated by reference to appropriate 
controls present on the array and in the sample. See, for example, the Experimental 
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In another embodiment, the values in the expression profile are obtained by 
measuring the abundance of the protein products of the differentially-expressed genes. 
The abundance of these protein products can be determined, for example, using 
antibodies specific for the protein products of the differentially-expressed genes. The 
term "antibody" as used herein refers to an immunoglobulin molecule or 
immunologically active portion thereof, i.e., an antigen-binding portion. Examples of 
immunologically active portions of immunoglobulin molecules include F(ab) and 
F(ab') 2 fragments which can be generated by treating the antibody with an enzyme 
such as pepsin. 

The antibody can be a polyclonal, monoclonal, recombinant, e.g., a chimeric 
or humanized, fully human, non-human, e.g., murine, or single chain antibody. In a 
preferred embodiment it has effector function and can fix complement. The antibody 
can be coupled to a toxin or imaging agent. 

A full-length protein product from a differentially-expressed gene, or an 
antigenic peptide fragment of the protein product can be used as an immunogen. 
Preferred epitopes encompassed by the antigenic peptide are regions of the protein 
product of the differentially expressed gene that are located on the surface of the 
protein, e.g., hydrophilic regions, as well as regions with high antigenicity. The 
antibody can be used to detect the protein product of the differentially expressed gene 
in order to evaluate the abundance and pattern of expression of the protein. These 
antibodies can also be used diagnostically to monitor protein levels in tissue as part of 
a clinical testing procedure, e.g., to, for example, determine the efficacy of a given 
therapy. Detection can be facilitated by coupling (i.e., physically linking) the 
antibody to a detectable substance (i.e., antibody labeling). Examples of detectable 
substances include various enzymes, prosthetic groups, fluorescent materials, 
luminescent materials, bioluminescent materials, and radioactive materials. Examples 
of suitable enzymes include horseradish peroxidase, alkaline phosphatase, [3- 
galactosidase, or acetylcholinesterase; examples of suitable prosthetic group 
complexes include streptavidin/biotin and avidin/biotin; examples of suitable 
fluorescent materials include umbelliferone, fluorescein, fluorescein isothiocyanate, 
rhodamine, dichlorotriazinylamine fluorescein, dansyl chloride or phycoerythrin; an 
example of a luminescent material includes luminol; examples of bioluminescent 
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materials include luciferase, luciferin, and aequorin, and examples of suitable 
radioactive material include 125 I, 131 I, 35 S or 3 H. 

Once the values comprised in the subject expression profile and the reference 
expression profile or expression profiles are established, the subject profile is 
compared to the reference profile to determine whether the subject expression profile 
is sufficiently similar to the reference profile. Alternatively, the subject expression 
profile is compared to a plurality of reference expression profiles to select the 
reference expression profile that is most similar to the subject expression profile. 

Any method known in the art for comparing two or more data sets to detect 
similarity between them may be used to compare the subject expression profile to the 
reference expression profiles. In some embodiments, the subject expression profile 
and the reference profile are compared using a supervised learning algorithm such as 
the support vector machine (SVM) algorithm, prediction by collective likelihood of 
emerging patterns (PCL) algorithm, the ^-nearest neighbor algorithm, or the Artificial 
Neural Network algorithm. Each of these algorithms is described in the Experimental 
section of the application. To determine whether a subject expression profile shows 
"statistically significant similarity" or "sufficient similarity" to a reference profile, 
statistical tests may be performed to determine whether the similarity between the 
subject expression profile and the reference expression profile is likely to have been 
achieved by a random event. An example of such a statistical test is the permutation 
test described in the Experimental section; however, any statistical test that can 
calculate the likelihood that the similarity between the subject expression profile and 
the reference profile results from a random event can be used. The accuracy of 
assigning a subject to a risk group based on similarity between an expression profile 
for the subject and an expression profile for the risk group depends in part on the 
degree of similarity between the two profiles. Therefore, when more accurate 
diagnoses are required, the stringency with which the similarity between the subject 
expression profile and the reference profile is evaluated should be increased. For 
example, in various embodiments, the p-value obtained when comparing the subject 
expression profile to a reference profile that shares sufficient similarity with the 
subject expression profile is less than 0.20, less than 0.15, less than 0.10, less than 
0.09, less than 0.08, less than 0.07, less than 0.06, less than 0.05, less than 0.04, less 
than 0.03, less than 0.02, or less than 0.01. 
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111 some embodiments, the assignment of a subject affected by leukemia to a 
leukemia risk group, the prediction of whether a subject affected by leukemia has an 
increased risk of relapse, or the prediction of whether a subject by affected by 
leukemia has an increased risk of developing secondary AML is used in a method of 
choosing a therapy for the subject affected by leukemia. A therapy, as used herein, 
refers to a course of treatment intended to reduce or eUminate the affects or symptoms 
of a disease, in this case leukemia. A therapy regiment will typically comprise, but is 
not limited to, a prescribed dosage of one or more drugs or hematopoietic stem cell 
transplantation. Therapies, ideally, will be beneficial and reduce the disease state but 
in many instances the effect of a therapy will have non-desirable effects as well. 
Thus, the methods of the invention are useful for monitoring the effectiveness of a 
therapy even when non-desirable side-effects are observed. 

Arrays, Computer-Readable Medium, and Kits 

The present invention provides compositions that are useful in determining the 
gene expression profile for a subject affected by leukemia and selecting a reference 
profile that is similar to the subject expression profile. These compositions include 
arrays comprising a substrate having a capture probes that can bind specifically to 
nucleic acid molecules that are differentially expressed in leukemia risk groups, 
subjects affected by leukemia who will relapse after conventional therapy or subjects 
affected by leukemia who will develop secondary AML after conventional therapy. 
Also provided is a computer-readable medium having digitally encoded reference 
profiles useful in the methods of the claimed invention. The invention also 
encompasses kits comprising an array of the invention and a computer-readable 
medium having digitally-encoded reference profiles with values representing the 
expression of nucleic acid molecules detected by the arrays. These kits are useful for 
assigning a subject affected by leukemia to a leukemia risk group, predicting whether 
a subject affected by leukemia has an increased risk of relapse, and predicting whether 
a subject affected by leukemia has an increased risk of developing secondary AML. 

The present invention provides arrays comprising capture probes for detecting 
the differentially expressed genes of the invention. By "array" is intended a solid 
support or substrate with peptide or nucleic acid probes attached to said support or 
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substrate. Arrays typically comprise a plurality of different nucleic acid or peptide 
capture probes that are coupled to a surface of a substrate in different, known 
locations. These arrays, also described as "microarrays" or colloquially "chips" have 
been generally described in the art, for example, in U.S. Patent. Nos. 5,143,854, 
5,445,934, 5,744,305, 5,677,195, 6,040,193, 5,424,186, 6,329,143, and 6,309,831 and 
Fodor et al. (1991) Science 251:161-11, each of which is incorporated by reference in 
its entirety. These arrays may generally be produced using mechanical synthesis 
methods or light directed synthesis methods which incorporate a combination of 
photolithographic methods and solid phase synthesis methods. 

Techniques for the synthesis of these arrays using mechanical synthesis 
methods are described in, e.g., U.S. Patent No. 5,384,261 , incorporated herein by 
reference in its entirety for all purposes. Although a planar array surface is preferred, 
the array may be fabricated on a surface of virtually any shape or even a multiplicity 
of surfaces. Arrays may be peptides or nucleic acids on beads, gels, polymeric 
surfaces, fibers such as fiber optics, glass or any other appropriate substrate, see U.S. 
Pat. Nos. 5,770,358, 5,789,162, 5,708,153, 6,040,193 and 5,800,992, each of which is 
hereby incorporated in its entirety for all purposes. Arrays may be packaged in such a 
manner as to allow for diagnostics or other manipulation of an all-inclusive device. 
See, for example, U.S. Pat. Nos. 5,856,174 and 5,922,591 herein incorporated by 
reference. 

The arrays provided by the present invention comprise capture probes that can 
specifically bind a nucleic acid molecule that is differentially expressed in leukemia 
risk groups, a nucleic acid molecule that is differentially expressed in subjects 
affected by leukemia who will relapse after conventional therapy, or a nucleic acid 
molecule that is differentially expressed in subjects affected by leukemia who will 
develop secondary AML after conventional therapy. These arrays can be used to 
measure the expression levels of nucleic acid molecules to thereby create an 
expression profile for use in methods of determining the diagnosis and prognosis for 
leukemia patients, and for monitoring the efficacy of a therapy in these patients as 
described elsewhere herein. 

In some embodiments, each capture probe in the array detects a nucleic acid 
molecule selected from the nucleic acid molecules designated in Tables 2-36, 44-49, 
52, 54-60, 63-68, and 70-74. The designated nucleic acid molecules include those 
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differentially expressed in leukemia risk groups selected from the T-ALL risk group 
(Tables 7, 14, 21, 28, 35, 59, and 67); E2A-PBX1 risk group (Tables 3, 10, 17, 24, 31, 
55, 64, and 71), TEL-AML1 risk group (Tables 8, 15, 22, 29, 36, and 60, 68, and 74), 
BCR-ABL risk group (Tables 2, 9, 16, 23, 30, 54, 63, and 70), MIX risk group 

5 (Tables 5, 12, 19, 26, 33, 57, 66, and 73), Hyperdiploid >50 risk group (Tables 4, 1 1, 
18, 25, 32, 56, 65, and 72), and Novel risk group (Tables 6, 13, 20, 27, 34, and 58), 
those differentially expressed in subjects affected by leukemia who will relapse after 
conventional therapy (Tables 44-48), and those differentially expressed in subjects 
affected by TEL-AML1 who will develop secondary AML after conventional therapy 

10 (Table 52). 

The arrays of the invention comprise a substrate have a plurality of addresses, 
where each addresses has a capture probe that can specifically bind a target nucleic 
acid molecule. The number of addresses on the substrate varies with the purpose for 
which the array is intended. The arrays may be low-density arrays or high-density 
1 5 arrays and may contain 4 or more, 8 or more, 12 or more, 1 6 or more, 20 or more, 24 
or more, 32 or more, 48 or more, 64 or more, 72 or more 80 or more, 96, or more 
addresses, or 192 or more, 288 or more, 384 or more, 768 or more, 1536 or more, 
3072 or more, 6144 or more, 9216 or more, 12288 or more, 1 5360 or more, or 18432 
or more addresses. In some embodiments, the substrate has no more than 12, 24, 48, 
20 96, or 192, or 384 addresses, no more than 500, 600, 700, 800, or 900 addresses, or no 
more than 1000, 1200, 1600, 2400, or 3600 addressees. 

The invention also provides a computer-readable medium comprising one or 
more digitally-encoded expression profiles, where each profile has one or more values 
representing the expression of a gene that is differentially expressed in a leukemia risk 
25 group, the expression level of a gene that is differentially expressed in subjects 

affected by leukemia who will relapse after conventional therapy, or the expression 
level of a gene that is differentially expressed in subjects affected by leukemia who 
will develop secondary AML after conventional therapy. Such profiles are described 
elsewhere herein. In some embodiments, the digitally-encoded expression profiles are 
30 comprised in a database. See, for example, U.S. Patent No. 6,308,1 70. 

The present invention also provides kits useful for diagnosing, treating, and 
monitoring the disease state in subjects affected by leukemia. These kits comprise an 
array and a computer readable medium. The array comprises a substrate having 
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addresses, where each address has a capture probe that can specifically bind a nucleic 
acid molecule that is differentially expressed in at least one leukemia risk group, in a 
subject affected by leukemia who will relapse after conventional therapy, or in a 
subject affected by leukemia who will develop secondary AML after conventional 
therapy. The results are converted into a computer-readable medium that has 
digitally-encoded expression profiles containing values representing the expression 
level of a nucleic acid molecule detected by the array. 

Methods of Screening and Therapeutic Targets 

The methods and compositions of the invention may be used to screen test 
compounds to identify therapeutic compounds useful for the treatment of leukemia. 
In one embodiment, the test compounds are screened in a sample comprising primary 
cells or a cell line representative of a particular leukemia risk group. After treatment 
with the test compound, the expression levels in the sample of one or more of the 
differentially-expressed genes of the invention are measured using methods described 
elsewhere herein. Values representing the expression levels of the differentially- 
expressed genes are used to generate a subject expression profile. This subject 
expression profile is then compared to a reference profile associated with the 
leukemia risk group represented by the sample to determine the similarity between the 
subject expression profile and the reference expression profile. Differences between 
the subject expression profile and the reference expression profile may be used to 
determine whether the test compound has anti-leukemogenic activity. 

The test compounds of the present invention can be obtained using any of the 
numerous approaches in combinatorial library methods known in the art, including: 
biological libraries; spatially addressable parallel solid phase or solution phase 
libraries; synthetic library methods requiring deconvolution; the 'one-bead one- 
compound' library method; and synthetic library methods using affinity 
chromatography selection. The biological library approach is limited to polypeptide 
libraries, while the other four approaches are applicable to polypeptide, non-peptide 
oligomer or small molecule libraries of compounds (Lam (1997) Anticancer Drug 
Des. 12:145). 

Examples of methods for the synthesis of molecular libraries can be found in 
the art, for example in DeWitt et al. (1993) Proc. Natl. Acad. Sci. USA 90:6909; Erb 
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et al. (1994) Proc. Natl. Acad. Sci. USA 91:1 1422; Zuckermann et al. (1994). J. Med. 
Chem. 37:2678; Cho et al. (1993) Science 261:1303; Carell et al. (1994) Angew. 
Chem. Int. Ed. Engl. 33:2059; Carell etal. (1994) Angew. Chem. Int. Ed. Engl. 
33:2061; and in Gallop et al. (1994) J. Med. Chem. 37:1233. Libraries of compounds 
maybe presented in solution (e.g., Houghten (1992) Biotechniques 13:412-421), or on 
beads (Lam (1991) Nature 354:82-84), chips (Fodor (1993) Nature 364:555-556), 
bacteria (U.S. Patent No. 5,223,409), spores (U.S. Patent No. 5,223,409), plasmids 
(Cull et al. (1992) Proc. Natl. Acad. Sci. USA 89:1865-1869) or on phage (Scott and 
Smith (1990) Science 249:386-390); (Devlin (1990) Science 249:404-406); (Cwirla et 
al. (1990) Proc. Natl. Acad. Sci. U.S.A. 97:6378-6382); (Felici (1991) J. Mol. Biol. 
222:301-310). 

Candidate compounds include, for example, 1) peptides such as soluble peptides, 
including Ig-tailed fusion peptides and members of random peptide libraries (see, e.g., 
Lam et al. (1991) Nature 354:82-84; Houghten et al. (1991) Nature 354:84-86) and 
combinatorial chemistry-derived molecular libraries made of D- and/or L- configuration 
amino acids; 2) phosphopeptides (e.g., members of random and partially degenerate, 
directed phosphopeptide libraries, see, e.g, Songyang et al. (1993) Cell 72:767-778); 3) 
antibodies (e.g., polyclonal, monoclonal, humanized, anti-idiotypic, chimeric, and single 
chain antibodies as well as Fab, F(ab% Fab expression library fragments, and epitope- 
binding fragments of antibodies); 4) small organic and inorganic molecules (e.g., 
molecules obtained from combinatorial and natural product libraries; 5) zinc analogs; 6) 
leukotriene A4 and derivatives; 7) classical aminopeptidase inhibitors and derivatives 
of such inhibitors, such as bestatin and arphamenine A and B and derivatives; 8) and 
artificial peptide substrates and other substrates, such as those disclosed herein above 
and derivatives thereof. 

The present invention discloses a number of genes that are differentially 
expressed in leukemia risk groups, in subjects affected by leukemia who will relapse 
after conventional therapy, or in subjects affected by leukemia who will develop 
secondary AML after conventional therapy. These differentially-expressed genes are 
shown in Tables 2-36 and 44-48, and 52. Because the expression of these genes is 
associated with leukemia risk factors, these genes may play a role in leukemogenesis. 
Accordingly, these genes and their gene products are potential therapeutic targets that 
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are useful in methods of screening test compounds to identify therapeutic compounds 
for the treatment of leukemia. 

The differentially-expressed genes of the invention may be used in cell-based 
screening assays involving recombinant host cells expressing the differentially- 
expressed gene product. The recombinant host cells are then screened to identify 
compounds that can activate the product of the differentially-expressed gene (i.e. 
agonists) or inactivate the product of the differentially-expressed gene {i.e. 
antagonists). 

Any of the leukemogenic functions mediated by the product of the differentially 
expressed gene maybe used as an endpoint in the screening assay for identifying 
therapeutic compounds for the treatment of leukemia. Such endpoint assays include 
assays for cell proliferation, assays for modulation of the cell cycle, assays for the 
expression of markers indicative of leukemia, and assays for the expression level of 
genes differentially expressed in leukemia risk groups as described above. 

Modulators of the activity of a product of a differentially-expressed gene 
identified according to these drug screening assays provided above can be used to treat a 
subject with leukemia. These methods of treatment include the steps of aciministering 
the modulators of the activity of a product of a differentially-expressed gene in a 
pharmaceutical composition as described herein, to a subject in need of such treatment. 

The following examples are offered by way of illustration and not by way of 
limitation. 

EXAMPLES 

EXAMPLE 1: 

To determine if gene expression profiling of leukemic cells could identify 
known biologic ALL subgroups, 327 diagnostic bone marrow (BM) samples were 
analyzed with AFFYMETRIX® oligonucleotide microarrays (Affymetrix Inc., Santa 
Clara, CA) containing 12,600 probe sets. 

In an initial analysis of the gene expression data set (12,600 probe sets in 327 
leukemia samples; greater than 4 x 10 6 data elements), an unsupervised two- 
dimensional hierarchical clustering algorithm was used to group leukemia samples 
with similar gene expression patterns against clusters of similarly expressed genes. 
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This analysis clearly identified 6 major leukemia subtypes that corresponded to T- 
ALL, hyperdiploid with>50 chromosomes, BCR-ABL, E2A-PBX1, TEL-AML1, and 
MLL gene rearrangement. Moreover, within the heterogeneous collection of 
leukemias that were not assigned to one of these subtypes, a novel subgroup of 14 
5 cases was identified that had a distinct gene expression profile. The separation of 
these seven leukemia subgroups was also seen using the multidimensional scaling 
procedure of discriminant analysis with variance (DAV), in which the data are 
reduced into component dimensions consisting of linear combinations of 
discriminating genes. For example, using the three component dimensions that 

10 accounted for 72.8% of the variance of gene expression among the subgroups, it was 
possible to distinguish T-ALL (43 cases), E2A-PBX1 (27 cases), TEL-AML1 (79 
cases) and hyperdiploid >50 (64 cases) from the remaining ALL subtypes (1 14 cases). 
Similarly, using three different components that account for an additional 16.1% of 
the variance in gene expression mad it possible to discriminate cases with BCR-ABL 

15 (15 cases), MLL gene rearrangement (20 cases) and the novel subgroup of ALL (14 
cases). 

Statistical methods were used to identify those genes that best define the 
individual groups. Expression profiles were obtained using the top 40 genes per 
subgroup as selected by a Chi square metric. Distinct groups of genes distinguish 

20 cases defined by E2A-PBX1 , MLL, T-ALL, hyperdiploid >50, BCR-ABL, the novel 
subgroup, and TEL-AML1. In addition to these specific subgroups, 65 cases (20% of 
the total) were identified that did not cluster into any of the leukemia subtypes. The 
expression profiles of these latter cases varied markedly, suggesting that they 
represent a heterogeneous group of leukemias. Nearly identical results were obtained 

25 when the hierarchical clustering was performed with genes selected by other 
statistical metrics. 

For T-ALL, two gene clusters that discriminated this subtype from B-lineage 
cases were identified. One cluster was expressed at high and one cluster was 
expressed at low levels. In contrast the top ranked discriminating genes for each of 
30 the other leukemia subtypes consisted primarily of genes that were overexpressed 
within the specific leukemia subtype. With the exception of T-ALL, the identified 
expression profiles do not represent a specific differentiation stage of the leukemic 
blasts. For example, although E2A-PBX1 is almost exclusively found in ALLs with a 
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pre-B cell immunophenotype (Hunger (1996) Blood 87: 121 1 -24), the identified 
expression profile was specific for the E2A-PBX1 genetic lesion and not the pre-B 
immunophenotype. 

To confirm that the microarray analysis provided an accurate reflection of 
5 actual gene expression levels, the microarray data was compared with results for RNA 
levels obtained by real-time RT-PCR (5 genes), fn addition, the corresponding 
protein levels were assessed by immunophenotype analysis performed by flow 
cytometry using nine specific cell surface antigens). A very high degree of 
correlation was observed between the levels of RNA expression detected by 

10 quantitative RT-PCR and microarray analysis. Similarly, in agreement with results 
from immunophenotying, T-lineage restricted RNA expression was observed for 
CD2, CD3, and CD8, whereas B-lineage restricted expression was observed for 
CD19, and CD22. In addition, the level of CD10 RNA expression closely correlated 
with protein levels, with high expression detected in TEL-AML1 leukemias, 

1 5 intermediate levels in E2 A-PBX1 and low to undetectable expression in cases with 
rearrangements of MLL. Thus, microarray analysis provides an accurate reflection of 
expression levels for most genes, and can be used to accurately detect the expression 
of the more common surface antigens used in the diagnostic evaluation of pediatric 
ALL patients. 

20 The majority of the leukemia subtype specific genes identified through this 

study were not previously known to have a restricted pattern of expression. In 
addition to their use as diagnostic and subclassification markers, these genes provide 
unique insights into the underlying biology of the different leukemia subtypes. For 
example, E2A-PBX1 leukemias were characterized by high expression of the c-Mer 

25 receptor tyrosine kinase (MERTK), a known transforming gene (Graham et al. (1 994) 
Cell Growth Differ. 5:647-657); and Georgescu etal. (1999) Mol. Cell Biol. 19:1171- 
81), suggesting that C-MER may be involved in the abnormal growth of these cells. 
Similarly, HOXA9 and MEIS1 were exclusively expressed in cases having MLL 
rearrangements, indicating that they may be directly involved in MLL mediated 

30 alterations in the growth of the leukemic cells. Interestingly, high expression of 

MTG16, a homologue of ETO (Gamou et al. (1998) Blood 91 : 4028-4037), was found 
in TEL-AML1 cases. Alteration of ETO family members in both t(8;21) acute 
myeloid leukemia (by translocation) (Downing (1999) Br. J. Hematol. 106:296-308) 
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and TEL-AML1 (by altered expression) suggests that alteration in the biologic 
function of ETO genes is mechanistically involved in these leukemias. 
Little is known about the underlying molecular pathogenesis of hyperdiploid ALL 
>50 chromosomes, which clinically is distinct from hyperdiploid cases having 47-50 
5 chromosomes. This distinction is supported by the marked differences in gene 

expression profiles between these two subgroups. Although hyperdiploid >50 ALLs 
have an excellent prognosis, the specific genetic lesions responsible for the aberrant 
proliferation in these cases remains poorly understood. Interestingly, almost 70% of 
the genes that define this subgroup are localized to either chromosome X or 21 . 

1 0 Moreover, the class defining genes on chromosome X were overexpressed in the 

hyperdiploid >50 chromosomes ALLs irrespective of whether the leukemic blasts had 
a trisomy of this chromosome (data not shown). Detailed analysis will be required to 
determine the specific signaling pathways that are disrupted as a result of the altered 
expression of these genes. Lastly, the novel subgroup of ALL was defined by high 

1 5 expression of a group of genes, including the receptor phosphatase PTPRM, and 

LHFPL2, a gene that is a part of the LHFP-like gene family, the founding member of 
which was identified as the target of a lipoma-associated chromosomal translocation 
(Petit et al. (1999) Genomics 57:438-41). 

20 Expression Profiling as a Diagnostic Tool 

A major goal of this study was to develop a single platform of expression 
profiling to accurately identify the known, prognostically important leukemia 
subtypes. To this end, computer-assisted learning algorithms were used to develop an 
expression-based leukemia classification. Through a reiterative process of error 

25 minimization, these algorithms learn to recognize the optimal gene expression 

patterns for a leukemia subtype. Classification was approached using a decision tree 
format, in which the first decision was T-ALL versus B-lineage (non-T-ALL), and 
then within the B-lineage subset, cases were sequentially classified into the known 
risk groups characterized by the presence of E2A-PBX1, TEL-AML1, BCR-ABL, 

30 MLL chimeric genes, and lastly hyperdiploid with >50 chromosomes. Cases not 

assigned to one of these classes were left unassigned. Classification was performed 
using a Support Vector Machine (SVM) algorithm with a set of discriminating genes 
selected by a correlation-based feature selection (CFS), or if this method selected 
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greater than 20 genes for a particular class, by using the top 20 ranked genes selected 
by a chi-square metric, or one of the other metrics detailed in the Experimental 
Procedures section. This approach resulted in an accurate class prediction in a 
randomly selected training set that consisted of two-fluids of the total cases (215 
5 cases). When this classification model was then applied to a blind test set consisting 
of the remaining 112 samples, an overall accuracy of 96% was achieved for class 
assignment. The number of genes required for optimal class assignment varied 
between classes. A single gene was sufficient to give 100% accuracy for both T-ALL 
and E2A-PBX1, whereas 7-20 genes were required for prediction of the other classes. 

1 0 Only slight differences were observed in the prediction accuracy of individual classes 
when the process was repeated using genes selected by a number of other metrics, 
including T-statistics, a novel metric referred to as Wilkins', or genes selected by a 
combination of self organizing maps (SOM) and DAV. Moreover, nearly identical 
results were obtained when the various sets of selected genes were used in a number 

1 5 of different supervised learning algorithms, including K-Nearest Neighbor (k-NN), 
Artificial Neural Network (ANN), and prediction by collective likelihood of emerging 
patterns (PCL). 

Four cases initially appeared to be misclassified as TEL-AML1 by gene 
expression analysis since they lacked a detectable chimeric transcript by RT-PCR. 

20 Upon further analysis by FISH, however, one of these cases was shown to have a 
TEL-AML1 fusion, presumably, a variant rearrangement that could not be detected 
with the amplification primers used for the TEL-AML1 RT-PCR assay. In each of 
the three remaining cases, re-examination of the karyotypes revealed translocations 
involving the p arm of chromosome 12. FISH analysis demonstrated that two of these 

25 cases had deletion of one TEL allele, whereas the remaining case had a partial 

deletion of one TEL allele. Thus, the identified expression profiles appear to reflect 
an abnormality of the TEL transcription factor, and may in fact provide a more 
accurate means of identifying a specific leukemia subtype defined by its underlying 
biology. Collectively, these data demonstrate that the single platform of gene 

30 expression profiling can accurately identify the known prognostic subtypes of ALL. 
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Use of Expression Profiles to Identify Patients at High Risk of Treatment Failure 

Relapse and the development of therapy-induced acute myeloid leukemia 
(AML) are the major causes of treatment failure in pediatric ALL. To determine if 
expression profiling might further enhance the ability to identify patients who are 
likely to relapse, the expression profiles of the four groups of leukemic samples were 
compared. The groups of samples used for this comparison were: l)diagnostic 
samples of patients that developed hematological relapses (n = 32); (ii) diagnostic 
samples from patients who remained in continuous complete remission (CCR) (n = 
201); (iii) diagnostic samples from patients who developed therapy-induced AML (n 
= 16); and (iv) leukemic samples collected at the time of ALL relapse (n = 25). Using 
DAV, distinct gene expression profiles were identified for each of these groups. 

To further assess the predictive power of the different gene expression 
profiles, supervised learning algorithms were used. Because of the overwhelming 
differences in the expression profiles of the different leukemia subtypes, it was not 
possible to identify a single expression signature that would predict relapse 
irrespective of the genetic subtype. However, within individual leukemic subtypes, 
distinct expression profiles could be defined that predicted relapse. Class assignment 
was performed using a SVM supervised learning algorithm with discriminating genes 
selected by CFS, or if this method returned >20 genes, the top 20 genes selected by T- 
statistics. For both the T-lineage and hyperdiploid >50 subgroups, expression profiles 
identified those cases that went on to relapse with an accuracy of 97% and 100%, 
respectively, as assessed by cross validation. Moreover, the predictive accuracy was 
statistically significant when compared to results from an analysis of 1000 random 
permutations of the specific patient data set. Similarly, expression profiles predictive 
of relapse were identified for TEL-AML, MLL, or cases that lacked any of the known 
genetic risk features. Although the predictive accuracy of these latter expression 
profiles was very high as assessed by cross validation, it did not reach statistical 
significance when compared to results from an analysis of 1000 random permutations 
of the same patient data set, likely secondary to the limited number of cases. The 
patterns of expression for a combination of genes, rather than expression levels of a 
single gene were found to have the greatest predictive accuracy. Since few known 
risk-stratifying biologic features have been previously identified for either T-ALL or 
-24- 



WO 03/083140 



PCT/LS03/08486 



hyperdiploid >50 ALL, the results suggest that the identified expression profiles 
provide independent risk stratifying information. 

A distinct expression profile was identified in the ALL blasts from patients 
who developed therapy-induced AML. Because secondary AML is thought to arise 
5 from a hematopoietic stem cell that is distinct from that giving rise to the primary 
leukemia, it is difficult to understand how the biology of the original ALL blasts 
could predict the risk of developing a therapy-induced complication. However, when 
the accuracy of expression profiling was evaluated in within the TEL-AML1 
subgroup, a distinct expression signature consisting of 20 genes was defined. This 
1 0 profile identified, with 1 00% accuracy in cross validation, all patients who developed 
secondary AML, with a p value of 0.031 as assessed by comparison to results from an 
analysis of 1000 random permutations of the patient data set. Genes within this 
signature included RSU1, a suppressor of the Ras signaling pathway, and Msh3, a 
mismatch repair enzyme. 

15 

Overview of Experimental Procedures 

A. Tumor Samples 

The diagnosis of ALL was based on the morphologic evaluation of the bone 
marrow and on the pattern of reactivity of the leukemic blasts with a panel of 

20 monoclonal antibodies directed against lineage-associated antigens. A total of 389 

pediatric acute leukemia samples were analyzed in this study, from which high quality 
gene expression data was obtained on 360 (93%). The successfully- analyzed samples 
included 332 diagnostic BM, 3 diagnostic peripheral bloods (PB), and 25 relapsed 
ALL samples from BM or PB. 264 (79%) of the diagnostic ALL BM samples and all 

25 relapse samples were from patients enrolled on St. Jude Children's Research Hospital 
Total Therapy Studies XIIIA or XIHB and corresponded to 64% of the patients 
treated on these protocols. The details of these protocols have been previously 
published (Pui et al. (2000) Leukemia 14:2286-94). The remaining samples were 
obtained from patients treated on St. Jude Total Therapy Studies XI, XII, XIV, XV, or 

30 by best clinical management. All protocols and consent forms were approved by the 
hospital's institutional review board, and informed consent was obtained from 
parents, guardians, or patients (as appropriate). The composition of the data sets used 
for the identification of gene expression profiles predictive of specific genetic 
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subtypes, hematological relapse, and risk of developing secondary AML are described 
below. 

B. Gene Expression Profiling 

RNA was extracted from cryopreserved mononuclear cell suspensions from 
diagnostic BM aspirates or PB samples using TRIZOL® (Invitrogen Corp., Carlsbad, 
California) according to the manufacturer's instructions, and the RNA integrity was 
assessed by using an Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto, 
CA). cDNA was synthesized using a T-7 linked oligo-dT primer and cRNA was then 
synthesized with biotinylated UTP and CTP. The labeled RNA was then fragmented 
and hybridized to HG_U95Av2 oligonucleotide arrays (Affymetrix Incorporated, 
Santa Clara, CA) according to the manufacturer's instructions. 

Arrays were scanned using a laser confocal scanner (Agilent) and the 
expression value for each gene was calculated using AFFYMETRIX® Microarray 
Software version 4.0. The average intensity difference (ADD) values were normalized 
across the sample set and minimum quality control standards were established for 
including a sample's hybridization data in the study. 10% of samples were run in 
duplicate to ensure consistency of data acquisition throughout the study. A high level 
of reproducibility was observed between replicate samples, with fewer than 1% of 
genes showing a variation in average intensity difference of greater than 2-fold. 

C. Statistical Analysis 

Unsupervised hierarchical clustering, principal component analysis (PCA), 
discriminant analysis with variance (DAV), and self organizing maps (SOM) were 
performed using GeneMaths software (version 1.5, Applied Maths, Belgium). Data 
reduction to define the genes most useful in class distinction was performed using a 
variety of metrics as detailed below. Genes selected by the various metrics were used 
in supervised learning algorithms to build classifiers that could identify the specific 
genetic or prognostic subgroups. The algorithms used included k-Nearest Neighbors 
(k-NN), Support Vector Machine (SVM), prediction by collective likelihood of 
emerging patterns (PCL), an artificial neural network (ANN), and weighted voting. 
Performance of each model was initially assessed by leave-one-out cross validation 
on a randomly selected stratified training set consisting of two-thirds of the total 
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cases. True error rates of the best performing classifiers were then determined using 
the remaining third of the samples as a blinded test group. Details of the individual 
metrics and supervised learning algorithms are described below. 

5 Detailed Experimental Procedures 

A. RNA Extraction, Labeling, Hybridization, and Data analysis 

Mononuclear cell suspensions from diagnostic BM aspirates or peripheral 
blood (PB) samples were prepared from each patient and an aliquot was 
cryopreserved. RNA was extracted using TRIZOL® following the manufacture's 

1 0 recommended protocol as described above. RNA integrity was assessed by 
electrophoresis on the Agilent 2100 Bioanalyzer (Agilent, Palo Alto, CA). 

First and second strand cDNA were synthesized from 5-15 jag of total RNA 
using the Superscript Double-Stranded cDNA Synthesis Kit ((Invitrogen Corp., 
Carlsbad, California) and an oligo-dT 24 -T7 (5'-GGC CAG TGA ATT GTA ATA 

15 CGA CTC ACT ATA GGG AGG CGG-3'; SEQ ID NO:l) primer according to the 
manufacturer's instructions. cRNA was synthesized and labeled with biotinylated 
UTP and CTP by in vitro transcription using the T7 promoter coupled double stranded 
cDNA as template and the T7 RNA Transcript Labeling Kit according the 
manufacturer's instructions (Enzo Diagnostics Inc., Farmingdale NY). Briefly, double 

20 stranded cDNA synthesized from the previous steps was washed twice with 70% 

ethanol and resuspended in 22 ul RNase-free water. The cDNA was incubated with 4 
ul of 10X each reaction buffer, lul of biotin labeled ribonucleotides, 2ul of DTT, 1 ul 
of RNase inhibitor mix and 2 ul 20X T7 RNA polymerase for 5 hours at 37°C. The 
labeled cRNA was separated from unincorporated ribonucleotides by passing through 

25 a CHROMA SPIN- 1 00 column (Clontech, Palo Alto, C A) and precipitated at -20°C 
for 1 hr to overnight. 

The cRNA pellet was resuspended in 10 ul Rnase-free H 2 0 and 10.0 jug was 
fragmented by heat and ion-mediated hydrolysis at 95 °C for 35 minutes in 200 mM 
Tris-acetate, pH 8.1, 500 mM KOAc, 150 mM MgOAc. The fragmented cRNA was 

30 hybridized for 16 hr at 45°C to HG_U95Av2 AFFYMETRTX® oligonucleotide arrays 
(Affymetrix, Santa Clara, CA) containing 12,600 probe sets from full-length 
annotated genes together with additional probe sets designed to represent EST 
sequences. Arrays were washed at 25°C with 6X SSPE (0.9M NaCl, 60 mM 



WO 03/083140 PCT/LS03/08486 

NaH 2 P0 4j 6 mM EDTA, 0.01% Tween 20) followed by a stringent wash at 50°C with 
100 mM MES, 0.1M NaCl 2 , 0.01% Tween 20. The arrays were then stained with 
phycoerythrin conjugated streptavidin (Molecular Probes, Eugene, OR). 

Arrays were scanned using a laser confocal scanner (Agilent, Palo Alto, CA) 
5 and the expression value for each gene was calculated using AFFYMETRIX® 

Microarray software (MAS 4.0). The signal intensity for each gene was calculated as 
the average intensity difference (AID), represented by [X(PM - MM)/(number of 
probe pairs)], where PM and MM denote perfect-match and mismatch probes, 
respectively. Expression values were normalized across the sample set by scaling the 
10 average of the fluorescent intensities of all genes on an array to a constant target 
intensity of 2500, then any AID over 45,000 was capped to a value of 45,000. All 
AID's less than 100, including negative values and absent calls were converted to a 
value of 1 . In addition, a variation filter was used to eliminate any probe set in which 
fewer than 1% of the samples had a present call, or if the Max AID - Min AID across 
15 the sample set was less than 100. The average intensity differences for each of the 
remaining genes were analyzed. For some metrics the data was log transformed prior 
to analysis. The minimum quality control values required for inclusion of a sample's 
hybridization data in the study were 10% or greater present calls, a GAPDH/Actin 
3 75 ' ratio <5 , and use of a scaling factor that was within 3 standard deviations from 
20 the mean of the scaling values of all chips analyzed. 

The average percent present calls for theoverall data set was 29.7%, and for 
each of the genetic subgroups was BCR-ABL (31.1%), E2A-PBX1 (28.9%), Hyper 
>50 (31%), MLL (29.8%), T-ALL (29.1%), TEL-AML1 (28.5%), Novel (30.2%), 
others (31.1%). In addition, each sample had >75% blasts. The average percentage 
25 blasts for the overall data set used to define the genetic subtypes was 93%, and for 
each genetic subtype was BCR-ABL (92%), E2A-PBX1 (96%), Hyper >50 (93%), 
MLL (93%), T-ALL (91%), TEL-AML1 (92%), Novel (95%), and others (94%). 

B Reproducibility of Microarray Data 
3 0 The reproducibility of the AFFYMETRIX® microarray system was assessed 

by comparing the gene expression profiles of RNA extracted from duplicate 
cryopreserved diagnostic leukemic samples from 23 patients with single RNA 
samples from 13 patients analyzed on two separate arrays. The mean number of 
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probe sets that displayed a 22-fold difference in expression between separately 
extracted but paired RNA samples was 144, and for single RNA samples analyzed on 
two separate occasions was 133. Moreover, very few probe sets were found to have a 
53-fold difference in expression levels between replicate samples. The observed 
number of probe sets showing a difference in expression values represents less than 
2% of the total number of probe sets on the microarray, and thus these data suggest 
that the AFFYMETRIX® microarray system has a very high degree of 
reproducibility. 

C. Comparison of Expression Profiles from PB and BM leukemia samples 
Matched BM and PB samples that contained 380% leukemic blasts were 

obtained from 10 patients and the RNA was extracted and assessed by microarray 
analysis. A very high level of correlation was observed between the expression 
profiles of BM and PB, with only 189 probe sets having a greater than a 2-fold 
difference in expression. No genes were found to be consistently over- or under- 
expressed in one sample type. These data demonstrate that there are minimal 
differences in the gene expression profiles of leukemic blasts obtained from BM or 
PB, and that diagnostic gene expression profiling is possible on samples obtained 
from the PB. 

D. RT-PCR Results 

Real-time TAQMAN® RT-PCR assays (Applied Biosystems, Foster City, 
CA) were performed to independently determine the level of mRNA for five genes 
that were found by microarray analysis to be predictive of either T-lineage ALL 
{CD35, CD3D antigen delta polypeptide TiT3 complex; MAL, mal T-Cell 
differentiation protein; and PRKCQ, protein kinase C theta) or E2A-PBX1 expressing 
ALL (MERTK, c-Mer proto-oncogene tyrosine kinase and KIAA802). The RNA 
samples analyzed included four samples each of E2A-PBX1 and T-ALL, and two 
samples each from the remaining subtypes (BCR-ABL, MLL, TEL-AML1, 
Hyperdiploid >50, Hyperdiploid 47-50, Hypodiploid, Pseudodiploid, and normal). 
Whenever possible, the forward and reverse primers were designed in different exons 
so that DNA contamination would not be a concern. In the case of MAL where this 
was not clear, the RNA was treated for 15 minutes at room temperature with 1 .0 unit 
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of DNase I (Invitrogen Corp., Carlsbad, California) using the Invitrogen protocol to 
remove any contaminating DNA. 

Thirty-three ng of RNA from each sample was reverse transcribed using 
random hexamers and Multiscribe Reverse Transcriptase (Applied Biosystems, Foster 
5 City, C A) in a total volume of 1 0 Real time PCR was perfonned on a Applied 
Biosystems PRISM® 7700 Sequence Detection System (Applied Biosystems). All 
probes were labeled at the 5' end with FAM (6-carboxy-fluroescein) and at the 3' end 
with TAMRA (6-carboxy-tetramethyl-rhodamine). 

The PCR reactions were performed in a total volume of 50 ul containing 10 jul 

10 of the reverse transcriptase product, 300 nM each of the forward and reverse primers, 
100 nM of probe, IX master mix and 1 jxl of AMPLITAQ GOLD® DNA polymerase 
(Applied Biosystems). Following a 10 minute incubation at 95°C to activate the 
polymerase, samples were denatured at 95°C for 15 seconds, then annealed and 
extended at 60°C for 1 minute, for a total of 40 cycles. The RNA from each sample 

1 5 was also amplified using primers and probes to RNase P (Applied Biosystems) for use 
in normalization according to the manufacturer's instructions. Negative controls were 
included in each run. Standard curves were generated for T-cell markers and RNase P 
using MOLT4 RNA, a T-cell leukemia cell line, and for the E2A-PBX1 markers and 
RNase P using a leukemia cell line, 697, that contains an E2A-PBX1 fusion. 

20 The expression level of the predictive genes and RNase P were detennined in 

each of the 24 ALL samples. A ratio was then calculated by taking the expression 
value for the specific gene and dividing it by the expression level of RNase P in the 
sample. These ratios were then compared to the values obtained from the 
AFFYMETRIX® chip data from the same RNA sample. The raw AFFYMETRIX® 

25 chip data were scaled as described and then normalized using the 3'GAPDH value for 
each sample, yielding a normalized ratio. The TAQMAN® results and 
AFFMETRTX® chip ratios were then log transformed and compared. Since the 
markers selected for TAQMAN® analysis were predictors for either E2A-PBX1 or T- 
ALLs, each gene was expected to have four RNA samples with high and 20 samples 

30 with low expression. For each gene evaluated, an average expression value for both 
the TAQMAN® results and AFFYMETRTX® data was calculated for all samples in 
the up-regulated group, and similarly, for the samples in the down-regulated group. 



-30- 



WO 03/083140 



PCT/LS03/08486 



E. Comparison of Real-time RT-PCR Data and AFFYMETRIX® Chip Data 

The normalized gene expression ratios for the TAQMAN® data (gene/RNase 
P) and for the AFFYMETRIX® microarray data (AID for a gene/AID for GAPDH) 
were log transformed and then the average expression values for each gene was 

5 calculated in the four samples in which its expression was expected to be up-regulated 
and separately in the 20 samples in which its expression was expected to be down- 
regulated. For example, for genes that were expected to be up-regulated in T-ALL 
{CD35, MAL, and PRKCQ), the log expression ratios in the T-ALL samples were 
averaged to give the up regulated values and the log expression ratios of each gene in 

10 the non-T-ALL cases were averaged to give the down regulated value. 

In both the TAQMAN® and the microchip array analysis, MERTK and 
KIAA802, were very highly expressed in the diagnostic samples containing E2A- 
PBX1, and expressed at low levels in all of the other samples. Likewise, PRKCQ, 
CD35 , and MAL, showed high levels of expression in T cells by both methodologies 

1 5 in comparison with non T-cells. The normalized ratios from the TAQMAN® assay 
were plotted against the normalized ratios from the microchip array for both the up- 
regulated and down-regulated genes. The correlation between TAQMAN® results 
and the microchip array results was 70% indicating that the same pattern of gene 
expression was seen in both analyses. The MERTK was extremely high in two of the 

20 E2A-PBX1 patient samples by TAQMAN® analysis. Removal of the MERTK gene 
from the analysis resulted in a correlation of 91% between the TAQMAN® results 
and the microchip array results. 

F. Comparison of AFFYMETRIX® Microarray Chip Results and 

25 Immunophenotype Results 

Leukemic blasts at the time of diagnosis were analyzed for expression of 
lineage restricted cell surface antigens using phycoerythrin- or fluorescein 
isothiocyanate-conjugated monoclonal antibodies against CD2, CD3e, CD4, CD5, 
CD7, CD8, CD10, CD19, and CD22 (Becton Dickinson Immunocytometry Systems, 

30 San Jose, CA, USA). Data were obtained using a COULTER® EPICS XL™ 

(Beckman Coulter, Miami, FL), a COULTER® ELITE™ (Beckman Coulter), or a 
BD FACSCalibur™ flow cytometer (Becton Dickinson, San Jose, CA). The 
expression patterns for these antigens were then compared to gene expression patterns 
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for the AFFYMETRIX® chip sites specified for CD2 (1 probe set, 4073 8_at), CD35 
(1 probe set, 38319_at), CD3e{\ probe set, 36277_at), CD3£(\ probe set, 37078_at), 
CD3y{\ probe set, 39226_at), CD4 (5 probe sets, 856_at, 1146_at, 35517_at, 
34003_at, and 37942_at), CD5 (lprobe set, 32953_at), CD 7 (1 probe set, 771_s_at), 
CD8a{\ probe set, 40699_at), CD8J3(1 probe set, 39239_at), CD10 (1 probe set, 
1389_at), CD19 (2 probe sets, 1096_g_at and 1116_at), and CD22 (2 probe sets, 
38521_at and 38522_s_at). As a control, the performance of the AFFYMETRIX® 
microarray probe sets were also assessed using RNA isolated from flow sorted single 
positive CD4+ and CD8+ thymocytes, and CD10+/CD19+ bone marrow cells. High 
RNA expression was observed in T-ALL for the T-lineage restricted genes CD2, 
CD35, s, and £ CD8a , and CD 7, and in B-lineage ALLs for the B-cell restricted 
genes CD19, and CD22. A similar high level of correlation was observed between 
RNA and protein expression for CD 10. The observed low expression levels of T-cell 
restricted genes in B-cell cases, and B-cell restricted genes in T-ALLs, is consistent 
with the low level of normal contaminating lymphocytes present in the diagnostic 
marrow samples analyzed. 

G. Patient Data Set 

A total of 389 Pediatric acute leukemia samples were analyzed in tins study, 
from which high quality gene expression data were obtained on 360 (93%). The 
successfully analyzed samples included: 332 diagnostic bone marrows (BM), 3 
diagnostic peripheral blood samples (PB), and 25 relapse ALL samples from BM or 
PB. 264 (79%) of the diagnostic ALL BM samples and all relapse samples were from 
patients treated on St. Jude Children's Research Hospital Total Therapy Studies XIIIA 
or XIIIB and correspond to 64% of the patients treated on these protocols. The details 
of these protocols are described in Pui et al, "Risk-adapted treatment for acute 
lymphoblastic leukemia: findings from St. Jude Children's Research Hospital," 
Haematology and Blood Transfusions, 1997, pp 629-37, Springer-Verlag, Berlin and 
in Pui et al. (2000) Leukemia 14:2286-94. Study XIIIA ran from December 20, 1991 
to August 23, 1994 and enrolled 165 patients, whereas Study XIIIB ran from August 
24, 94 to July 27, 1998 and enrolled 247 patients. No patients were lost to follow-up 
during treatment. When the databases were frozen for analysis, 100% and 93% of 
event-free survivors in studies XIIIA and XJHB, respectively, had been seen within 12 
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months. The median (minimum, maximum) follow-up of the event-free survivors 
was 8.09 (6.59, 9.94) and 4.52 (2.37, 7.06) years for XIIIA and XIIIB, respectively. 
All other samples were obtained from patients treated on St. Jude Total Therapy 
Studies XI, XII, XIV, XV, or by best clinical management. 
5 For the identification of gene expression profiles that predict specific genetic 

subtypes of ALL, 327 diagnostic BM samples were used. The criteria for inclusion in 
this data set were the availability of a cryopreserved diagnostic BM sample containing 
^5% blasts, and complete data from each of the following diagnostic studies: 
morphology, immunophenotype, cytogenetics, DNA ploidy, Southern blot for MLL 

1 0 gene rearrangements, and RT-PCR analysis for MLL-AF4, MLL-AF9, E2A-PBX1, 
TEL-AML1, and BCR-ABL. This final data set includes diagnostic BM samples 
from XV (38), XIV (4), XIIIA (100), XIIIB (161), or from patients treated on one of 
our older protocols or by best clinical management (24). 

The data sets used to identify expression profiles predicative of hematologic 

1 5 relapse and the development of therapy-induced AML are described in Table 1 . 

Table 1: Patient Database 



Diagnostic samples used for subtype classification (n=327) 







BCR-ABL 


subgroup (n=15) 






Label® 


Protocol" 


Outcome" 7, 


Label® 


Protocol" Outcom 


BCR-ABL-C1 


T13B 


CCR 


BCR-ABL-#4 


Til 


NA 


BCR-ABL-R1 


T13A 


Heme Relapse 


BCR-ABL-#5 


T12 


NA 


BCR-ABL-R2 


T13A 


Heme Relapse 


BCR-ABL-#6 


T12 


NA 


BCR-ABL-R3 


T13B 


Heme Relapse 


BCR-ABL-#7 


T12 


NA 


BCR-ABL- 












Hyperdip-R5 


T13B 


Heme Relapse 


BCR-ABL-#8 


T14 


NA 


BCR-ABL-#1 


T13A 


Censored 


BCR-ABL-#9 


T15 


NA 


BCR-ABL-#2 


T13B 


Censored 


BCR-ABL-Hyperdip-#1 0 


T12 


NA 


BCR-ABL-#3 


T13B 


Censored 












E2A-PBX1 


subgroup fn=27) 






E2A-PBX1-C1 


T13A 


CCR 


E2A-PBX1-#1 


Others 


NA 


E2A-PBX1-C2 


T13A 


CCR 


E2A-PBXl-#2 


Others 


NA 


E2A-PBX1-C3 


T13A 


CCR 


E2A-PBXl-#3 


Others 


NA 


E2A-PBX1-C4 


T13A 


CCR 


E2A-PBXl-#4 


Others 


NA 


E2A-PBX1-C5 


T13A 


CCR 


E2A-PBXl-#5 


Others 


NA 


E2A-PBX1-C6 


T13B 


CCR 


E2A-PBXl-#6 


Others 


NA 


E2A-PBX1-C7 


T13B 


CCR 


E2A-PBXl-#7 


Til 


NA 


E2A-PBX1-C8 


T13B 


CCR 


E2A-PBXl-#8 


Til 


NA 


E2A-PBX1-C9 


T13B 


CCR 


E2A-PBXl-#9 


T12 


NA 


E2A-PBX1-C10 


T13B 


CCR 


E2A-PBX1-#10 


T12 


NA 


E2A-PBX1-C11 


T13B 


CCR 


E2A-PBX1-#11 


T14 


NA 


E2A-PBX1-C12 


T13B 


CCR 


E2A-PBX1-#12 


T15 


NA 
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E2A-PBX1-R1 


T13B 


Heme Relapse 


E2A-PBX1-#13 


T15 


NA 


E2A-PBX1-2M#1 


T13B 


2nd AML 












HvDerdiD>5 


0 subgroup <n=64) 






Hyperdip>50-Cl 


T13A 


CCR 


Hyperdip>50-C33 


T13B 


CCR 


Hyperdip>50-C2 


T13A 


CCR 


Hyperdip>50-C34 


T13B 


CCR 


Hyperdip>50-C3 


T13A 


CCR 


Hyperdip>50-C35 


T13B 


CCR 


Hyperdip>50-C4 


T13A 


CCR 


Hyperdip>50-C36 


T13B 


CCR 


Hyperdip>50-C5 


T13A 


CCR 


Hyperdip>50-C37 


T13B 


CCR 


Hyperdip>50-C6 






Hyperdip>5 0-C3 8 


T13B 


CCR 


Hyperdip>50-C7 


T13A 


CCR 


Hyperdip>50-C39 


T13B 


CCR 


Hyperdip>50-C8 


T13A 


CCR 


Hyperdip>50-C40 






Hyperdip>50-C9 


T13A 


CCR 


Hyperdip>50-C41 


T13B 


CCR 


Hyperdip>50-C10 


T13A 


CCR 


Hyperdip>50-C42 


T13B 


CCR 


Hyperdip>50-Cll 


T13A 


CCR 


Hyperdip>50-C43 


T13B 


CCR 
Heme 


Hyperdip>50-C12 


T13A 


CCR 


Hyperdip>50-Rl 


T13A 


Relapse 
Heme 


Hyperdip>50-C13 


T13A 


CCR 


Hyperdip>50-R2 


T13A 


Relapse 
Heme 


Hyperdip>5 0-C 1 4 


T13A 


CCR 


Hyperdip>50-R3 


T13A 


Relapse 
Heme 


Hyperdip>50-C15 


T13B 


CCR 


Hyperdip>50-R4 


T13B 


Relapse 
Heme 


Hyperdip>50-C16 


T13B 


CCR 


Hyperdip>50-R5 


T13B 


Relapse 


Hyperdip>50-C17 


T13B 


CCR 


Hyperdip>50-2M#1 


T13A 


2nd AML 


Hyperdip>50-C18 


T13B 


CCR 


Hyperdip>50-2M#2 


T13B 


2nd AML 


Hyperdip>50-C19 


T13B 


CCR 


Hyperdip>50-#1 


T13A 


Censored 


Hyperdip>50-C20 


T13B 


CCR 


Hyperdip>50-#2 


T13B 


Censored 


Hypcrdip>50-C21 


T13B 


CCR 


Hyperdip>50-#3 




NA 


Hyperdip>50-C22 


T13B 


CCR 


Hyperdip>50-#4 


Others 


NA 


Hyperdip>50-C23 


T13B 


CCR 


Hyperdip>50-#5 


T12 


NA 


Hyperdip>50-C24 


T13B 


CCR 


Hyperdip>50-#6 


T15 


NA 


Hyperdip>50-C25 


T13B 


CCR 


Hyperdip>50-#7 


T15 


NA 


Hyperdip>50-C26 


T13B 


CCR 


Hyperdip>50-#8 


T15 


NA 


Hyperdip>50- 






Hyperdip>50-#9 


T15 


NA 




T13B 


CCR 


Hyperdip>50-C28 


T13B 


CCR 


Hyperdip>50-#10 


T15 


NA 


Hyperdip>50-C29 


T13B 


CCR 


Hyperdip>50-#11 


T15 


NA 


Hyperdip>50-C30 


T13B 


CCR 


Hyperdip>50-#12 


T15 


NA 


Hyperdip>50-C31 


T13B 


CCR 


Hyperdip>50-#13 


T15 


NA 


Hyperdip>50-C32 


T13B 


CCR 


Hyperdip>50-#14 


T15 


NA 



Hyperdip47-50- 
Cl 

Hyperdip47-50- 
C2 

Hyperdip47~50~ 
C3-N 

Hyperdip47-50- 
C4 

Hyperdip47-50- 
C5 



Hvperdip47-50 subgroup (n=23) 



T13A 


CCR 


Hyperdip47-50-C13 


T13B 


CCR 


T13A 


CCR 


Hyperdip47-50-C14-N 


T13B 


CCR 


T13A 


CCR 


Hyperdip47-50-C15 


T13B 


CCR 


T13A 


CCR 


Hyperdip47-50-C16 


T13B 


CCR 


T13A 


CCR 


Hyperdip47-50-C17 


T13B 


CCR 
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Hyperdip47~50- 






C6 


T13B 


CCR 


Hyperdip47-50- 






C7 


T13B 


CCR 


Hyperdip47-50- 






C8 


T13B 


CCR 


Hyperdip47-50- 






C9 


T13B 


CCR 


Hyperdip47-50- 






C10 


T13B 


CCR 


riyperuipq- / - j u- 








T13B 


CCR 


Hyperdip47-50- 






C12 


T13B 


CCR 


Hypodip-Cl 


T13A 


CCR 


Hypodip-C2 


T13A 


CCR 


Hypodip-C3 


T13B 


CCR 


Hypodip-C4 


T13B 


CCR 


Hypodip-C5 


T13B 


CCR 



Hyperdip47-50-C18 


T13B 


CCR 


Hyperdip47-50-C19 


T13B 


CCR 


Hyperdip47-50-2M#l 


T13A 


2nd AML 


Hyperdip47-50-#l 


T15 


NA 


Hyperdip47-50-#2 


T15 


NA 


Hyperdip47-50-#3 


T15 


NA 



Hvpc 


dm subsrouD (n=9) 

Hypodip-C6 




Hypodip-2M#l 




Hypodip-#l 




Hypodip-#2 



T13B 
T13A 
T15 
T15 



CCR 
2nd AML 



MLL-C1 


T13A 


CCR 


MLL-2M#1 


T13A 


MLL-C2 


T13B 


CCR 


MLL-2M#2 


T13A 


MLL-C3 


T13B 


CCR 


MLL-#1 


T13B 


MLL-C4 


T13B 


CCR 


MLL-#2 


T13B 


MLL-C5 


T13B 


CCR 


MLL-#3 


Others 


MLL-C6 


T13B 


CCR 


MLL-#4 


Others 


MLL-R1 


T13A 


Heme Relapse 


MLL-#5 


Others 


MLL-R2 


T13A 


Heme Relapse 


MLL-#6 


T12 


MLL-R3 


T13B 


Heme Relapse 


MLL-#7 


T14 


MLL-R4 


T13B 


Heme Relapse 


MLL-#8 


T14 



2nd AML 
2nd AML 
Censored 
Censored 

NA 

NA 



Normal subgroup (n=18) 



Normal-Cl-N 


T13A 


CCR 


Normal-C2-N 


T13A 


CCR 


Normal-C3-N 


T13A 


CCR 


Normal-C4-N 


T13B 


CCR 


Normal-C5 


T13B 


CCR 


Normal-C6 


T13B 


CCR 


Normal-C7-N 


T13B 


CCR 


Normal-CS 


T13B 


CCR 


Normal-C9 


T13B 


CCR 


Pseudodip-Cl 


T13A 


CCR 


Pseudodip-C2-N 


T13A 


CCR 


Pseudodip-C3 


T13A 


CCR 


Pseudodip-C4 


T13A 


CCR 


Pseudodip-C5 


T13A 


CCR 



Normal-CIO 

Normal-Cll-N 

Normal-C12 



Normal-R3 
Normal-#l 
Normal-#2 
Normal-#3 



Pseudodip subgroup (n=29) 



T13B 
T13B 
T13B 



T13B 

T13B 
T13A 
T13B 
T13B 



CCR 

CCR 

CCR 

Heme 
Relapse 

Heme 
Relapse 

Heme 
Relapse 
Censored 



Pseudodip-Cl 6-N 


T13B 


CCR 


Pseudodip-C17 


T13B 


CCR 


Pseudodip-Cl 8 


T13B 


CCR 


Pseudodip-Cl 9 


T13B 


CCR 
Heme 


Pseudodip-Rl-N 


T13A 


Relapse 



-35- 



WO 03/083140 



PCT/LS03/08486 



Pseudodip-C6 

Pseudodip-C7 

Pseudodip-C8 

Pseudodip-C9 

Pseudodip-CIO 

Pseudodip-Cll 

Pseudodip-C12 

Pseudodip-C13 

Pseudodip-C14 

Pseudodip-C15 



T13A CCR 

T13A CCR 

T13A CCR 

T13A CCR 

T13B CCR 

T13B CCR 

T13B CCR 

T13B CCR 

T13B CCR 

T13B CCR 



Pseudodip-#l 

Pseudodip-#2 

Pseudodip-#3 

Pseudodip-#4 

Pseudodip-#5 

Pseudodip-#6 

Pseudodip-#7 

Pseudodip-#8-N 

Pseudodip-#9 



T13B 

T13B 

Others 

Others 

T15 

T15 

T15 

T15 

T15 



Other 
Relapse 
Censored 

NA 

NA 

NA 



T-ALL-C1 


T13A 


CCR 


T ATT HOI 


T13B 


T-ALL-C2 


T13A 


CCR 


T A T T (~">4 


T13B 


T-ALL-C3 


T13A 


CCR 




T13B 


T-ALL-C4 


T13A 


CCR 


T AT T r")fi 


T13B 


T-ALL-C5 


T13A 


CCR 


T-ALL-R1 


T13A 


T-ALL-C6 


T13A 


CCR 


T-ALL-R2 


T13B 


T-ALL-C7 


T13A 


CCR 


T-ALL-R3 


T13B 


T-ALL-C8 


T13A 


CCR 


T-ALL-R4 


T13B 


T-ALI^C9 


T13B 


CCR 


T-ALL-R5 


T13B 


T-ALL-C10 


T13B 


CCR 


T-ALL-R6 


T13B 


T-ALL-C11 


T13B 


CCR 


T-ALL-2M#1 


T13B 


T-ALL-C12 


T13B 


CCR 


T-ALL-#1 


T13B 


T-ALL-C13 


T13B 


CCR 


T-ALL-#2 


T13B 


T-ALL-C14 


T13B 


CCR 


T-ALL-#4 


T13B 


T-ALL-C15 


T13B 


CCR 


T-ALL-#5 


T13B 


T-ALL-C16 


T13B 


CCR 


T-ALL-#6 


T15 


T-ALL-C17 


T13B 


CCR 


T-ALL-#7 


T15 


T-ALL-C18 


T13B 


CCR 


T-ALL-#8 


T15 


T-ALL-C19 


T13B 


CCR 


T-ALL-#9 


T15 


T-ALL-C20 


T13B 


CCR 


T-ALL-#10 


T15 


T-ALL-C21 


T13B 


CCR 


T-ALL-#11 


T15 


T-ALL-C22 


T13B 


CCR 







CCR 

CCR 

CCR 

CCR 

Heme 
Relapse 

Heme 
Relapse 

Heme 
Relapse 

Heme 
Relapse 

Heme 
Relapse 

Heme 
Relapse 
2nd AML 

Other 
Relapse 

Other 
Relapse 
Censored 
Censored 
NA 
NA 
NA 
NA 



TEL-AML1 subgroup fn=79) 



TEL-AML1-C1 


T13A 


CCR 


TEL-AML1-C41 


T13B 


CCR 


TEL-AML1-C2 


T13A 


CCR 


TEL-AML1-C42 


T13B 


CCR 


TEL-AML1-C3 


T13A 


CCR 


TEL-AML1-C43 


T13B 


CCR 


TEL-AML1-C4 


T13A 


CCR 


TEL-AML1 -C44 


T13B 


CCR 


TEL-AML1-C5 


T13A 


CCR 


TEL-AML1-C45 


T13B 


CCR 


TEL-AML1-C6 


T13A 


CCR 


TEL-AML1-C46 


T13B 


CCR 


TEL-AML1-C7 


T13A 


CCR 


TEL-AML1-C47 


T13B 


CCR 


TEL-AML1-C8 


T13A 


CCR 


TEL-AML1-C48 


T13B 


CCR 


TEL-AML1-C9 


T13A 


CCR 


TEL-AML1-C49 


T13B 


CCR 


TEL-AML1-C10 


T13A 


CCR 


TEL-AML1-C50 


T13B 


CCR 
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TEL-AML1-C11 


T13A 


CCR 


TEL-AML1-C51 


T13B 


CCR 


TEL-AML1-C12 


T13A 




tpt ami 1 -C52 


T13B 


CCR 


TEL-AML1-C13 


T13A 


CCR 


TEL-AML1-C53 


T13B 


CCR 


TEL-AML1-C14 


T13A 


CCR 


TEL-AML1-C54 


T13B 




TEL-AML1-C15 


T13A 


CCR 


TEL-AML1-C55 


T13B 


CCR 


TEL-AML1-C16 


T13A 


CCR 


TEL-AML1-C56 


T13B 


CCR 


TEL-AML1-C17 


T13A 


CCR 






CCR 


TEL-AML1-C18 


T13A 




TEL-AML1-R1 


T13A 


Relapse 
Heme 


TEL-AML1-C19 


T13A 


CCR 


TEL-AML1-R2 


T13A 


Relapse 
Heme 


TEL-AML1-C20 


T13A 


CCR 


TEL- AML 1 -R3 






TEL-AML1-C21 


T13A 


CCR 


TEL-AML 1 -2M# 1 


T13A 


2nd AML 


TEL-AML1-C22 


T13A 


CCR 


TEL- AML 1 -2M#2 


T13A 


2nd AML 


TEL- AML 1 -C23 


T13A 


CCR 


TEL-AML 1 -2M#3 


T13A 


2nd AML 


TEL-AML1-C24 


T13A 


CCR 


TEL-AML 1 -2M#4 


T13B 


2nd AML 


TEL-AML1-C25 


T13A 


CCR 


TEL-AMLl -2M#5 


T13B 


2nd AML 
Other 


TEL-AML1-C26 


T13A 


CCR 


TEL-AML1-#1 


T13B 


Relapse 


TEL-AML1-C27 


T13A 


CCR 


TEL-AML l-#2 


T13A 


Censored 


TEL-AML1-C28 


T13A 


CCR 


TEL-AMLl -#3 


T13A 


Censored 
Censored 


TEL-AML1-C29 


T13B 


CCR 


TEL-AML l-#4 


T13B 


TEL-AML1-C30 


T13B 


CCR 


TEL-AML l-#5 


T15 


NA 


TEL-AMLl -C31 


T13B 


CCR 


TEL-AML l-#6 


T15 


NA 


TEL-AML1-C32 


T13B 


CCR 


TEL-AMLl-#7 


T15 


NA 


TEL-AML1-C33 


T13B 


CCR 


TEL-AMLl-#8 


T15 


NA 


TEL-AML1-C34 


T13B 


CCR 


TEL-AMLl-#9 


T15 


NA 


TEL-AML1-C35 


T13B 


CCR 


TEL-AML1-#10 


T15 


NA 


TEL-AML1-C36 


T13B 


CCR 


TEL-AML1-#11 


T15 


NA 


TEL-AML1-C37 


T13B 


CCR 


TEL-AML1-#12 


T15 


NA 


TEL-AML1-C38 


T13B 


CCR 


TEL-AML1-#13 


T15 


NA 


TEL-AML1-C39 


T13B 


CCR 


TEL-AML1-#14 


T15 


NA 


TEL-AML1-C40 


T13B 


CCR 









®Label key- 
Subtype Name-C# Dx Sample of patient in CCR 
Subtype Name-R# Dx Sample of patient who developed a hematologic 

5 relapse 

Subtype Name-# Dx Sample used for subgroup classification only 

Subtype Name-2M# Dx Sample of patient who later developed 2 n AML 

Subtype Name-N Dx Sample in novel group 

1 0 "Protocol- Protocol that patient was treated on 

% Outcome- 

CCR Continuous complete remission 

Heme Relapse Hematologic relapse 

15 Other Relapse Extramedullary relapse ^ 

2nd AML Diagnostic samples of patients who later developed 2 n 

AML 

Censored Censored due to BM transplant, treated off protocol, or died in CR 
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NA Not applicable, primarily because the patient was not treated on 

Total 13, and thus is excluded from the analysis used to identify gene expression 
profiles predictive of outcome 



H. Diagnostic Samples Used for Prediction of Prognosis 

In addition to the 201 CCR and 27 Heme Relapse cases listed in Table 1, five 
additional relapse cases were also included in the prognostic analysis, giving a total of 
233 cases for this analysis. These additional cases were not included in the subgroup 
1 0 prediction data set because they did not meet the established criteria for the reasons 
listed below. 

Label Protocol Comment 

BCR-ABL-R4 T13B Did not meet QC criteria because 
contained 70% blasts 

15 MLL-R5 T13A Peripheral Blood Sample (90% blasts) 

Normal-R4 T13B Molecular studies not performed 

T-ALL-R7 Tl 3A Peripheral Blood Sample (90% blasts) 

T-ALL-R8 T13B Peripheral Blood Sample (90% blasts) 



20 I. Diagnostic Samples used for prediction of Secondary AML 

hi addition to the 201 CCR and 13 secondary AML cases listed in Table 1, 
three additional diagnostic marrow samples from patients who developed secondary 
AML were also included in the prognostic analysis. This gives a total of 217 cases 
used for this analysis. These additional cases were not included in the diagnostic data 

25 set because they did not meet the established criteria for the reasons listed below. 

Label Protocol Comment 

Hyperdip>50-2M#3 T12 Non Total 1 3 diagnostic sample 

Hypodip-2M#2 T13B No molecular studies performed 

Hypodip-2M#3 T12 Non Total 13 diagnostic sample 

30 

Relapsed Samples (n=25) 

Twenty-five relapse samples were analyzed, 17 samples which were paired to 
the diagnostic samples listed above (Subtype Name-2M#), and 8 additional non- 
paired relapse samples. 
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Detailed Analysis 

A. Hierarchical cluster analysis of diagnostic cases using all genes that passed the 
variation filter 

5 Two-dimensional hierarchical clustering was performed using Pearson 

correlation coefficient and an unweighted pair group method using arithmetic 
averages (GeneMaths, version 1.5). The results of hierarchical clustering of the 327 
diagnostic samples using the 10,991 probe sets that passed the variation filter can be 
viewed at our web site, www.stjuderesearch.org/ALLl. 

10 

B. Methods for gene selection 

Discriminating genes for the various leukemia subtypes were selected using a 
variety of statistical metrics. The individual metrics used and the list of selected probe 
sets and corresponding genes are given below. 

15 

1. Chi-Square 

The Chi square method evaluates each gene individually by measuring the Chi 
square statistics with respect to the classes. The method first discretizes the observed 
expression values of the gene into several intervals using an entropy-based 

20 discretization method 1 . The Chi square statistics of a gene is then calculated as 

X 2 = SS(Aij - Eij) 2 /Eij, summing over intervals i = l..m and classes j = l..k. Ay is the 
number of samples in the i th interval that are of the j th class. Ey is the expected 
frequency of Ay and is calculated as Eg = Ri * Cj/N, where R; is the number of 
samples in the i th interval, Cj is the number of samples in the j th class, and N is the 

25 total number of samples. The genes are then sorted according to their Chi square 
statistics: the larger the Chi square statistics, the more important the gene. The 40 
genes with the highest Chi square statistics in each subtype are listed in Tables 2-8. 
Generally, using anywhere from the top 20 to 40 genes did not result in significant 
differences in subtype prediction accuracy. Therefore, only the top 20 genes in 

30 subtype prediction were used, unless noted otherwise. 
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Table 2. Genes selected by Chi square: i 


3CR-ABL 














Chi 


Above/ 


Affymetrix 






Reference 


square 


Below 


number 


Gene Name 


GeneSymbol 


number 






1 1637_at 


mitogen-activated protein kinase- 


MAPKAPK3 


U09578 


62.75 


Above 




activated protein kinase 3 










2 36650_at 


cyclin D2 


CCND2 


D13639 


59.79 


Above 


3 40196_at 


HYA22 protein 


HYA22 


D88153 


54.79 


Above 


4 1635_at 


proto-oncogene tyrosine-protein 


ABL 


U07563 


54.77 


Above 




kinase ABL gene 










5 33775_s_at 


caspase 8 apoptosis-related 


CASP8 


X98176 


49.70 


Above 




cysteine protease 
proto-oncogene tyrosine-protein 






48.29 


Above 


6 1636 g_at 


ABL 


U07563 




kinase ABL gene 










7 41295_at 


GTT1 protein 


GTTl 


AL041780 


42.60 


Above 


8 37600_at 


extracellular matrix protein 1 


ECM1 


U68186 


42.60 


Above 


9 37012_at 


capping protein actin filament 


CAPZB 


U03271 


38.46 


Above 




muscle Z-line beta 










10 39225_at 


alkylglycerone phosphate synthase AGPS 


Y09443 


38.46 


Above 


11 1326_at 


caspase 10 apoptosis-related 


CASP10 


U60519 








cysteine protease 










12 34362_at 


solute carrier family 2 facilitated 


SLC2A5 


M55531 


37.54 


Above 




glucose transporter member 5 








Above 


13 33150 at 


disrupter of silencing 10 


SAS10 


AI126004 


36.95 


14 4005 l_at 


TRAM-like protein 


KIAA0057 


D31762 


36.95 


Above 


15 39061_at 


bone marrow stromal cell antigen 


BST2 


D28137 


36.95 


Above 


16 33172_at 


2 

hypothetical protein FIJI 0849 


FLJ10849 


5292 


36.95 


Above 


17 37399_at 


aldo-keto reductase family 1 


AKR1C3 


D 17793 


36.95 


Above 




member C3 3-alpha 












hydroxysteroid dehydrogenase 












typeH 






36.95 


Above 


18 317_at 


protease cysteine 1 legumain 


PRSC1 


D55696 


19 40953_at 


calponin 3 acidic 


CNN3 


S80562 


33.94 


Above 


20 330_s_at 


tubulin, alpha 1, isoform44 


TUBA1 


HG2259- 


33.32 


Above 






HT2348 






21 40504_at 


paraoxonase 2 


PON2 


AF001601 


31.46 


Above 


22 38578_at 


tumor necrosis factor receptor 
superfamily member 7 


TNFRSF7 


M63928 






23 39044_s_at 


diacylglycerol kinase delta 130kD 


DGKD 


D73409 


29.59 


Below 


24 36634_at 


BTG family member 2 


BTG2 


U72649 


29.16 


Below 


25 38119_at 


glycophorin C Gerbich blood 


GYPC 


X12496 


29.16 


Above 


26 32562_at 


group 

endoglin Osler-Rendu- Weber 


ENG 


X72012 


27.96 


Above 


27 33228 g at 


syndrome 1 

interleukin 10 receptor beta 


IL10RB 


AI984234 


27.70 


Below 


28 37006_at 


step II splicing factor SLU7 


SLU7 


AI660656 


27.15 


Above 
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30 38220_at 

31 1211_s_at 



33 36591_at 

34 36035_at 



35 980_at 

36 671_at 



38 39330_s_at 

39 1983_at 

40 2001_g_at 



Homo sapiens mRNA for TSC-22- 
like protein 

dihydropyrimidine dehydrogenase 
CASP2 and RIPK1 domain 
containing adaptor with death 
domain 

v-abl Abelson murine leukemia 
viral oncogene homolog 1 

tubulin alpha 1 testis specific 
anchor attachment protein 1 Gaalp 
yeast homolog 

Niemann-Pick disease type CI 
secreted protein acidic cysteine- 
rich osteonectin 

C-type calcium dependent 
carbohydrate-recognition ' 
lectin superfamily member 2 
activation-induced 
actinin alpha 1 
cyclinD2 
ataxia 



DPYD 
CRADD 



TUBA1 
GPAA1 



NPC1 
SPARC 



ACTN1 
CCND2 
ATM 



U20938 
U84388 



X06956 
AB002135 



AF002020 
J03040 



M95178 
X68452 
U26455 



27.15 Above 
26.46 Above 



25.90 Above 
25.34 Above 



25.29 Above 
25.29 Above 



23.70 Above 
23.70 Above 



Affymetrix 
number 



Table 3: Genes selected by Chi Square for E2A-PBX1 
Gene Name GeneSymbol Reference 

number 



32063_at 
33355_at 



430_at 
40454_at 

753_at 
33821_at 



9 39614_at 

10 3 8340_at 



ADP-ribosyltransferase NAD poly 
ADP-ribose polymerase 

ADP-ribosyltransferase NAD poly 
ADP-ribose polymerase 

pre-B-cell leukemia transcription 
factor 1 

Homo sapiens cDNA FLJ 12900 
fis clone NT2RP2004321 (by 
CELERA serach of target 
sequence = PBX1) 
nucleoside phosphorylase 
FAT tumor suppressor Drosophila 
homolog 
nidogen 2 

Human DNA sequence from clone 

RP3-483K16 on chromosome 

6pl2.1-21.1 

KIAA0802 protein 

huntingtin interacting protein- 1- 

related 

c-mer proto-oncogene tyrosine 



ADPRT 

ADPRT 

PBX1 
PBX1 



NP 
FAT 

NID2 
HELOl 

KIAA0802 
KIAA0655 



12 39929_at KIAA0922 protein 



J03473 

J03473 

M86546 
AL049381 



X00737 
X87241 

D86425 
AL034374 

AB018345 
ABO 14555 

U08023 

AB023139 



Chi 
square 
value 

187.00 



Above/ 
Below 
Mean 

Above 



187.00 Above 



187.00 
187.00 



187.00 
176.11 

164.28 
155.00 

153.46 
143.85 

142.34 

139.97 



Above 
Above 



Above 
Above 

Above 
Above 

Above 
Above 

Above 

Above 
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15 362_at 

16 33513_at 

17 37225_at 

18 854_at 

19 35974_at 

20 36452_at 

21 40648_at 

22 38393_at 

23 38994_at 

24 34861_at 

25 38748_at 

26 40113_at 

27 36179_at 



30 41017_at 

31 37625_at 

32 38679_g_at 

33 1389_at 



34 34783_s_at 

35 36959_at 

36 39864_at 

37 41862_at 



Homo sapiens mRNA cDNA 
DKFZp586C1019 from clone 
DKFZp586C1019 
GS3955 protein 
protein kinase C zeta 
signaling lymphocytic activation 
molecule 

KIAA0172 protein 
B lymphoid tyrosine kinase 
lymphoid-restricted membrane 
protein 
synaptopodin 

c-mer proto-oncogene tyrosine 
kinase 

KIAA0247 gene product 
STAT induced STAT inhibitor-2 
golgi autoantigen golgin subfamily 
a3 

adenosine deaminase RNA- 
specific B 1 homolog of rat RED1 
GS3955 protein 

mitogen-activated protein kinase- 
activated protein kinase 2 
colony stimulating factor 2 
receptor beta low-affinity 
granulocyte-macrophage 
Human recombination activating 
protein (RAG2) gene 
myosin-binding protein H 
interferon regulatory factor 4 
small nuclear ribonucleoprotein 
polypeptide E 

membrane metallo-endopeptidase 
neutral endopeptidase 
enkephalinase CALLA CD 10 
BUB 3 budding uninhibited by 
benzimidazoles 3 yeast homolog 
ubiquitin-conjugating enzyme E2 



cold inducible RNA-binding 
protein 

KIAA0056 protein 



39 37177_at CD58 antigen lymphocyte 

function-associated antigen 3 



long-chain 1 











GS3955 


D87119 


135.24 


Above 


SLAM 


Z15108 
U33017 


131.36 
131.36 


Above 


KIAA0172 


D79994 


131.36 


Above 


BLK 


S76617 


130.95 


Above 


LRMP 


U10485 


123.33 


Above 


KIAA1029 


AB028952 


123.33 


Above 


MERTK 


U08023 


120.51 


Above 


KIAA0247 


D87434 


120.51 


Above 


STATI2 


AF037989 


118.58 


Below 


1 \j\JL\JJ\j 




116.80 




AD ARB 1 


U76421 


114.13 


Above 


GS3955 


D87119 


114.13 


Above 


MAPKAPK2 


U12779 






CSF2RB 


H04668 


113.04 


Above 


RAG2 


M94633 


111.32 


Above 


MYBPH 

IRF4 

SNRPE 


U27266 
U52682 
AA733050 


109.73 
108.51 
106.02 


Above 
Above 
Above 


MME 


J03779 


105.65 


Below 




AF047473 


103.87 


Above 


UBE2V1 


U49278 


103.87 


Above 


CIRBP 


D78134 


99.76 


Below 


KIAA0056 


D29954 


99.76 


Above 


FLU 


M98833 


96.47 


Above 


CD58 


Y00636 


93.84 


Above 


y FACVL1 


D88308 


93.17 


Above 



-42- 



WO 03/083140 



PCT/LS03/08486 



Table 4: Genes selected by Chi square for Hyperdiploid >50 





Affymetrix 


Gene Name 


GeneSymbol 


Reference 


Chi 


Above/ 




number 






number 


square 


Below 














1 


36620_at 


superoxide dismutase 1 soluble 


SOD1 


X023 17 


52.43 


Above 






amyotrophic lateral sclerosis 1 














adult 








Above 


2 


37350 at 


Human DNA sequence from clone 


PSMD10 


AL031177 


48.71 






889N15 on chromosome Xq22.1- 










3 


171_at 


von Hippel-Lindau binding protein VBP 1 


U56833 


45.80 


Above 


4 


37677_at 


1 

phosphoglycerate kinase 1 


PGK1 


V00572 


45.80 


Above 


5 


41724_at 


accessory proteins BAP31/BAP29 


DXS1357E 


X81109 


45.58 


Above 


6 


32207_at 


membrane protein palmitoylated 1 


MPP1 


M64925 


44.07 


Above 






55kD 








Above 


„ 


38738_at 


SMT3 suppressor of mif two 3 


oivll jnl 


X99584 


43.57 






yeast homolog 1 










8 


40480 s at FYN oncogene related to SRC 


FYN 


M 14333 


43.57 








FGRYES 










9 


38518_at 


sex comb on midleg Drosophila 


SCML2 


Y 18004 


43.20 


Above 






like 2 










10 


41132_r_at 


heterogeneous nuclear 


HNRPH2 


U01923 


43.15 


Above 






ribonucleoprotein H2 H 




ABO 19392 


43.01 




11 


31492_at 


muscle specific gene 


M9 


Below 


12 


38317_at 


transcription elongation factor A 


TCEAL1 


M99701 


41.10 


Above 






SII like 1 








Above 


13 


40998_at 


trinucleotide repeat containing 11 


TNRC11 


AF071309 


40.88 






THR-associated protein 230 kDa 














subunit 








Above 


14 


35688_g_at 


mature T-cell proliferation 1 


MTCP1 


Z24459 


40.52 


1 


40903_at 


ATPase H transporting lysosomal 


APT6M8-9 


AL049929 


40.33 


Above 






vacuolar proton pump membrane 














sector associated protein M8-9 










16 


36489_at 


phosphoribosyl pyrophosphate 


PRPS1 


D00860 


40.33 


Above 


17 


1520_s_at 


synthetase 1 
interleukin 1 beta 


IL1B 




40 29 


Above 


18 


35939_s_at 


POU domain class 4 transcription 


POU4F1 


L20433 


38.74 








factor 1 










19 


38604_at 


neuropeptide Y 


NPY 


AI198311 


38.26 


Above 


20 


31863_at 


KIAA0179 protein 


KIAA0179 


D 80001 


38.26 




21 


890_at 


ubiquitin-conjugating enzyme 


UBE2A 


M74524 




Above 






E2A RAD6 homolog 










22 


39402_at 


interleukin 1 beta 


IL1B 


M15330 


37.92 


Above 


23 


41490_at 


phosphoribosyl pyrophosphate 


PRPS2 


Y00971 


37.72 


Above 






synthetase 2 








Above 


24 


34753__at 


synaptobrevin-like 1 


SYBL1 


X92396 


37.72 


25 


40891_f_at 


DNA segment on chromosome X 


DXS9879E 


X92896 


37.15 


Above 






unique 9879 expressed sequence 






37.15 


Above 


26 


306_s_at 


high-mobility group nonhistone 


HMG14 


J02621 






chromosomal protein 14 
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hypoxanthine 
phos 

Lesch-Nyhan syndroi 
dyskeratosis congenita 1 dyskerin DKC1 
NADH dehydrogenase ubiquinone NDUFA1 
1 alpha subcoraplex 1 7.5kD 
MWFE 

SH3-domain binding protein 5 SH3BP5 
BTK-associated 

transmembrane trafficking protein TMP2 1 
myxovirus influenza resistance 1 MX1 
homolog of murine interferon- 
inducible protein p78 
34374_g_at upstream regulatory element UREB1 

binding protein 1 
36542_at solute carrier family 9 SLC9A6 
sodium/hydrogen exchanger 



28 34829_at 

29 36169_at 



31 36128_at 

32 37014_at 



U59151 
N47307 



L40397 
M33882 



36.48 Above 
36.48 Above 



35.88 Above 
35.65 Above 



Z97054 35.55 Above 

AF030409 35.55 Above 



35 


688_at 


isoform 6 

proteasome prosome macropain 
26S subunit ATPase 1 


PSMC1 


L02426 


35.55 


Above 


36 


955_at 


calmodulin type I 




HG1862- 
HT1897 
U46692 


35.55 


Above 


37 


35816_at 


cystatin B stefin B 


CSTB 


35.27 


Above 


38 


38459_g_at Human cytochrome b5 (CYB5) 


CYB5 


L39945 


35.18 


Above 


39 


41288_at 


gene 

matrix Gla protein 


MGP 


AL036744 


35.18 


Above 


40 


3225 l_at 


hypothetical protein FLJ21 174 


FLJ21174 


AA149307 


35.14 


Above 



Affyraetrix 
number 



Table 5: Genes selected by Chi square for MLL 
Gene Name 



34306_at 
40797_at 



32193_at 
40518_at 



9 32207_at 

10 33859_at 



muscleblind Drosophila like 
a disintegrin and 
metalloproteinase domain 10 
LGALS1 Lectin, galactoside- 
binding, soluble, 1 
SI 00 calcium-binding protein 
A 10 annexin II ligand calpactin 
I light polypeptide pi 1 
insulin-like growth factor 
binding protein 7 
plexin CI 

protein tyrosine phosphatase 
receptor type C 
DNA segment on chromosome 
12 unique 2489 expressed 
sequence 

membrane protein palmitoylated 
1 55kD 

sin3 -associated polypeptide 
18kD 



GeneSymbol 


Reference 


Chi 


Above/ 




number 


square 


Below 






value 


Mean 


MBNL 


AB007888 


64.07 


Above 


ADAM10 


AF009615 


62.85 


Above 


LGALS1 


AI535946 


57.97 


Above 


S100A10 


AI201310 


57.97 


Above 


IGFBP7 


L19182 


55.22 


Above 


PLXNC1 


AF030339 


53.59 


Above 


PTPRC 


Y00062 


53.40 


Above 


D12S2489E 


AJ001687 


51.47 


Above 


1 MPP1 


M64925 


50.73 


Below 


SAP18 


U96915 


50.48 


Above 



-44- 



WO 03/083140 



PCT/LS03/08486 



11 


38391_at 


capping protein actin filament 






gelsolin-like 


12 


40763_at 


Meisl mouse homolog 


13 


1 126_s_at 


cell surface glycoprotein CD44 


14 


34721_at 


gene 

FK506-binding protein 5 


15 


37809_at 


homeo box A9 


16 


3486 l_at 


golgi autoantigen golgin 






subfamily a 3 


17 


38194_s_at 


immunoglobulin kappa constant 


18 


657_at 


protocadherin gamma subfamily 






C3 


19 


36918_at 


guanylate cyclase 1 soluble 








20 


32215_i_at 


KIAA0878 protein 


21 


38160_at 


lymphocyte antigen 75 


22 


38413_at 


defender against cell death 1 


23 


1389_at 


membrane metallo- 






endopeptidase neutral 






endopeptidase enkephalinase 
CALLACD10 


24 


34168_at 


deoxynucleotidyltransferase 






terminal 


25 


2036_s_at 


CD44 antigen homing function 






and Indian blood group system 


26 


40522_at 


glutamatc-ammonia ligase 






glutamine synthase 


27 


854_at 


B lymphoid tyrosine kinase 


28 


40067_at 


E74-like factor 1 ets domain 






transcription factor 


29 


39756_g_at 


X-box binding protein 1 


30 


36940_at 


TGFB1 -induced anti-apoptotic 








31 


36935_at 


RA^ 21 t ' tivator 






GTPase ac va g pro em 


32 


32134_at 


testin 


33 


39379_at 


Homo sapiens mRNA cDNA 






DKFZp586C1019 from clone 






DKFZp586C1019 


34 


40493_at 


Human cell surface glycoprotein 






CD44 ' A2 




769 s at 






4U41J at 


acet^Coenzyme A 






acyltransferase 1 peroxisomal 3- 






oxoacyl-Coenzyme A thiolase 


37 


35983_at 


hypothetical protein R32184_l 


38 


40519_at 


protein tyrosine phosphatase 






receptor type C 


39 


794_at 


protein tyrosine phosphatase 






non-receptor type 6 



40 4 1234_at DnaJ Hsp40 homolog subfamily 
B member 6 



CAPG 


M94345 


50.26 


Above 


MEIS1 


U85707 


50.26 


Above 


CD44 


L05424 


50.17 


Above 


FKBP5 


U42031 


50.17 


Above 


HOXA9 


U41813 


50.17 


Above 


GOLGA3 


D63997 


47.58 


Below 


IGKC 


M63438 


46.18 


Below 


PCDHGC3 


LI 1373 


46.05 


Above 


GUCY1A3 


Y15723 


43.90 


Above 


KIAA0878 


AB020685 


43.90 


Above 


LY75 


AF011333 


43.90 


Above 


DAD1 


D 15057 


43.90 


Above 


MME 


J03779 


43.82 


Below 




Ml 1722 


43.82 


Below 




M59040 


42.55 


AW 




X59834 


42.55 


Above 


BLK 


S76617 


42.34 


Above 


ELF1 


M82882 


40.85 


Above 


XBP1 


Z93930 


39.95 


Below 


TIAF1 


D86970 


39.82 


Below 


RASA1 


M23379 


38.77 


Above 


DKFZP586 


AL050162 


38.77 


Above 


B2022 










AL049397 


38.77 


Above 






38.44 




ANXA2 


D00017 


37.61 


Above 


ACAA1 


X14813 


37.55 


Above 


R32184_l 


AC004528 


37.55 




PTPRC 


Y00638 


36.56 


Above 


PTPN6 


X62055 


36.56 


Above 


f DNAJB6 


AI540318 


36.56 


Above 
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Table 6: Genes selected by Chi square for Novel risk group 





Affymetrix 


Gene Name 


GeneSymbol 


Reference 


Chi 


Above/ 




number 






number 


value 


Mean 


1 


37960_at 




CHST2 


ABO 14679 


175.82 








6/keratan sulfotransferase 2 










2 


31892_at 


protein tyrosine phosphatase 


PTPRM 


X58288 


172.85 


Above 






receptor type M 








Above 


3 


994 at 


protein tyrosine phosphatase 
receptor type M 


PTPRM 


"VIGORS 


172 85 


4 


995_g_at 


protein tyrosine phosphatase 
receptor type M 


PTPRM 


X58288 


172.85 


Above 


5 


41074_at 


G protein-coupled receptor 49 


GPR49 


AF062006 


139.36 


Above 


6 


41073_at 


G protein-coupled receptor 49 


GPR49 


AI743745 


139.36 


Above 


7 . 


34676_at 


KIAA1099 protein 


KIAA1099 


AB029022 


137.71 


Above 


8 


36139_at 


DKFZP586G0522 protein 


DKFZP586G05 
22 

LHFPL2 


AL050289 


127.05 


Above 


9 


37542_at 


lipoma HMGIC fusion partner- 
like 2 


D86961 


120.79 


Above 


10 


41159_at 


clathrin heavy polypeptide He 


CLTC 


D21260 


115 15 


Above 


11 


4008 l_at 


phospholipid transfer protein 


PLTP 


L26232 


108.33 


Above 


12 


32800_at 


Human retinoid X receptor 
alpha mRNA, 3' UTR, partial 


RXR 


U66306 


107.39 


Above 






sequence 








Above 


13 


36906_at 


catmabinoid receptor 1 brain 


CNR1 


U73304 


107.39 


14 


39878_at 


protocadherin 9 


PCDH9 


AI524125 


99.20 


Above 


15 


41747 s at 


Human myocyte-specific 
enhancer factor 2A (MEF2A) 
gene, last coding exon, and 


MEF2A 


U49020 


99.20 


Above 






complete cds. 








Above 


16 


33410_at 


integrin alpha 6 


ITGA6 


S66213 


96.17 


17 


34947_at 


phorbolin-like protein MDS019 


MDS019 


AA442560 


93.59 


Above 


18 


36029_at 


chromosome 1 1 open reading 
frame 8 


CHORF8 


U57911 


93.59 


Above 


19 


41708_at 


KIAA1034 protein 


KIAA1034 


AB028957 


92.60 


Above 


20 


1664_at 


insulin-like growth factor 2 




HT3739 


92.60 




21 


32736_at 


HSPC022 protein 


HSPC022 


W68830 


91.62 


Below 


22 


41266_at 


integrin alpha 6 


ITGA6 


X53586 


86.95 


Above 


23 


36566 at 


cystinosis nephropathic 


CTNS 


AJ222967 


82.89 


Above 


24 


1825_at 


IQ motif containing GTPase 
activating protein 1 


IQGAP1 


L33075 


81.20 


Below 


25 


1731_at 


platelet-derived growth factor 
receptor alpha polypeptide 


PDGFRA 


M21574 


78.22 


Above 


26 


37023_at 


lymphocyte cytosolic protein 1 
L-plastin 


LCP1 


J02923 


78.22 


Below 


27 


33037_at 


carbohydrate N- 
acetylglucosamme 6-0 
sulfotransferase 7 


CHST7 


AL022165 


76.00 


Above 


28 


33411_g_at 


integrin alpha 6 


ITGA6 


S66213 


75.47 


Above 


29 


538_at 


CD34 antigen 


CD34 


S53911 


' 74.86 


Above 
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31 38364_at 

32 40423_at 

33 35192_at 



35 38747_at 

36 37687_i_at 



39 31782_at 

40 32842_at 



lanosterol synthase 2 3- 
oxidosqualene-lanosterol 
cyclase 

BCE-1 protein 
KIAA0903 protein 
glycine dehydrogenase 
decarboxylating glycine 
decarboxylase glycine cleavage 
system protein P 
myeloid/lymphoid or mixed- 
lineage leukemia trithorax 
Drosophila homolog 
translocated to 2 
Human CD34 gene, exon 8. 
Fc fragment of IgG low affinity 
na receptor for CD32 
MAD mothers against 
decapentaplegic Drosophila 
homolog 7 

Human PAC clone RP3-515N1 
from22qll.2-q22 
prostaglandin D2 receptor DP 
B-cell CLL/lymphoma 7A 



BCE-1 

KIAA0903 

GLDC 



CD34 
FCGR2A 



PTGDR 
BCL7A 



AF068197 
AB020710 
D90239 



M81945 
M31932 



U31099 
X89984 



71.90 Above 



71.90 Above 
71.29 Above 



69.45 Above 
67.75 Above 

66.28 Above 



61.92 Above 
61.57 Above 



Affymetrix 



1096_g_at 
38242_at 
32794_g_at 
37988_at 

38017_at 



36277_£ 
38095_i 



Table 7. Genes selected for Chi square for T-ALL 
Gene Name GeneSymbol Reference 



CD3D antigen delta polypeptide CD3D 
TiT3 complex 

CD19 antigen CD19 
B cell linker protein SLP65 
T cell receptor beta locus TRB 
CD79B antigen CD79B 



CD79A antigen CD79A 

immunoglobulin-associated 

alpha 

Human la-associated invariant M13560 

gamma-chain gene, exon 8, 

clones lambda-y(l,2,3). 

Human membran protein (CD3- CD3E 

epsilon) gene, exon 9. 



AA919102 

M28170 
AF068180 
X00437 
M89957 

U05259 
M13560 
M23323 



value 

215.00 

206.48 
198.52 
197.71 
197.71 

197.53 



Above/ 
Below 
Mean 

Above 

Below 
Below 
Above 
Below 



major histocompatibility HLA-DPB1 M83664 
complex class II DP beta 1 

10 39318_at T-cell leukemia/lymphoma 1A TCL1A X82240 

11 38147_at SH2 domain protein 1A Duncan SH2D1A AL023657 

s disease lymphoproliferative 
syndrome 

12 41723_s_at major histocompatibility HLA-DRB1 M32578 

complex class II DRbeta 1 



197.53 
191.09 
189.78 



9.78 



Below 
Above 
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13 


38833_at 


Human mRNA for SB classll 




X00457 


189.03 


Below 






histocompatibility antigen 














alpha-chain 










14 


3323 8_at 


Human T-lymphocyte specific 


lck 


U23852 


189.03 


Above 






protein tyrosine kinase p561ck 














(lck) abberant mRNA 










15 


37039_at 


major histocompatibility 


HLA-DRA 


J00194 


188.93 


Below 






complex class II DR alpha 










16 


3805 l_at 


mal T-cell differentiation protein MAL 


X76220 


188.93 


Above 


17 


37344_at 


major histocompatibility 


HLA-DMA 


X62744 


187.25 


Below 






complex class II DM alpha 










18 


38096 f at 


major histocompatibility 


HLA-DPB 1 


M83664 


182.38 


Below 






complex class II DP beta 1 










19 


2059_s_at 


lymphocyte-specific protein 


LCK 


M36881 


182.38 


Above 






tyrosine kinase 










20 


1105_s_at 


T cell receptor beta locus 


TRB 


M12886 


180.45 


Above 


21 


32649_at 


transcription factor 7 T-cell 


TCF7 


X59871 


177.84 


Above 






specific HMG-box 










22 


38949_at 


protein kinase C theta 


PRKCQ 


L01087 


172.59 


Below 


23 


39709_at 


selenoprotein W 1 


SEPW1 


U67171 


171.96 


Above 


24 


41165_g_at 


immunoglobulin heavy constant IGHM 


X67301 


171.96 


Below 


25 


36473_at 


ubiquitin specific protease 20 


USP20 


AB023220 


167.27 


Above 


26 


266_s_at 


CD24 antigen small cell lung 




L33930 


165.56 


Below 






carcinoma cluster 4 antigen 










27 


40570 at 


forkheadboxOlA 


FOXOIA 


AF032885 


165.29 


Below 






rhabdomyosarcoma 










28 


40775_at 


integral membrane protein 2A 


ITM2A 


AL021786 


164.14 


Above 


29 


37420_i_at 


Human DNA sequence from 




AL022723 


164.14 


Below 






clone RP3-377H14 on 














chromosome 6p2 1.32-22.1. 










30 


1085_s_at 


phospholipase C gamma 2 


PLCG2 


M37238 


161.30 


Below 






phosphatidylinositol-specific 










31 


38018_g_at 


CD79A antigen 


CD79A 


U05259 


160.51 


Below 






immunoglobulin-associated 
























32 


35643_at 


nucleobindin 2 


NUCB2 


X76732 


160.07 


Above 


33 


A1 1 i+ 

411 oo^at 


immunoglobulin heavy constanl 


: IGHM 


X58529 


158.50 


eow 


34 


38415_at 


mu 

protein tyrosine phosphatase 


PTP4A2 


U14603 


155.78 


Above 






type IVA member 2 










35 


38893_at 


neutrophil cytosolic factor 4 


NCF4 


AL008637 


155.78 


Below 






40kD 










36 


1241_at 


protein tyrosine phosphatase 


PTP4A2 


U14603 


155.78 


Above 






type IVA member 2 










37 


32793 at 


T cell receptor beta locus 


TRB 


X00437 


155.43 


Above 


38 


36571 at 


topoisomerase DNA II beta 


TOP2B 


X68060 




Below 






180kD 










39 


37399_at 


aldo-keto reductase family 1 


AKR1C3 


D 17793 


151.93 


Above 






member C3 3 -alpha 














hydroxysteroid dehydrogenase 














type II 










40 


41097_at 


telomeric repeat binding factor 2 TERF2 


AF002999 


151.86 


Below 
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hypothetical protein FLJ20154 FLJ20154 
POU domain class 2 associating POU2AF1 
factor 1 

core-binding factor runt domain CBFA2T3 
alpha subunit 2 tr anslocated to 3 
piccolo presynaptic cytomatrix PCLO 
protein 

isopentenyl-diphosphate delta IDI1 



Affymetrix 
number 

1 38652_at 

2 36239_at 

3 41442_at 

4 37780_at 

5 36985_at 

6 38578_at tumor necrosis factor receptor TNFRSF7 

superfamily member 7 

7 38203_at potassium intermediate/small KCNN1 

conductance calcium-activated 
channel subfamily N member 1 

8 35614_at transcription factor-like 5 basic TCFL5 

helix-loop-helix 

9 32224_at KIAA0769 gene product KIAA0769 

10 32730_at Homo sapiens mRNA for 

KIAA1750 protein partial cds 

11 35665_at phosphoinositide-3 -kinase class PIK3C3 

3 

12 1077_at recombination activating gene 1 RAG1 

13 36524_at Pvho guanine nucleotide ARHGEF4 

exchange factor GEF 4 

14 34194_at Homo sapiens cDNA FLJ21697 

fis clone COL09740 

15 36937_s_at PDZ and LEVI domain 1 elfin PDLIM1 

16 36008_at protein tyrosine phosphatase PTP4A3 

type IVA member 3 

17 1299 jA telomeric repeat binding factor 2 TERF2 

18 41814_at rucosidase alpha-L- 1 tissue FUCA1 

19 41200_at CD36 antigen collagen type I CD36L1 

receptor thrombospondin 
receptor like 1 

20 35238_at TNF receptor-associated fector 5 TRAF5 

21 880_at FK506-binding protein 1A 12kD FKBP1A 

22 33690_at Homo sapiens mRNA cDNA 

DKFZp434A202 from clone 
DKFZp434A202 

23 40272_at collapsin response mediator 

protein 1 

24 35362_at myosin X 

25 41819_at FYN-binding protein FYB- 

120/130 

26 40279_at KIAA0 121 gene product 

27 1488_at protein tyrosine phosphatase 

receptor type K 



CRMP1 

MYO10 
FYB 

KIAA0121 
PTPRK 



Reference 


Chi 


Above/ 


number 


square 


Below 




value 


Mean 


AF070644 


137.92 


Above 


Z49194 






AB010419 


130.17 


Above 


AB011131 


126.79 


Above 


X17025 


125.47 


Above 


M63928 


115.72 


Above 


U69883 


112.87 


Above 


ABO 12 124 


108.45 


Above 


AB018312 


107.08 


Above 


AL080059 


104.93 


Above 


Z46973 


104.83 


Above 


M29474 


102.90 


Above 


AB029035 


100.67 


Above 


AL049313 


98.31 


Above 


U90878 


96.91 


Below 


AF041434 


96.68 


Above 


X93512 


93.08 


Above 


M29877 






Z22555 


90.86 


Above 


AB000509 


90.81 


Above 


M34539 


86.69 


Above 


AL080190 


86.69 


Above 


D78012 


85.44 


Above 


AB018342 


83.60 


Above 


U93049 


83.25 


Above 


D50911 


81.66 


Above 


L77886 


81.66 


Above 
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30 769_s_at 

31 33415_at 

32 1980_s_at 

33 32579_at 



34 39425_at 

35 755_at 

36 37343_at 

37 1336_s_at 

38 41097_at 

39 31786_at 

40 160029_at 



MAD mothers against 
decapentaplegic Drosophila 



nucleotide binding 



all 

A2 



non-metastatic cells 2 protein 
NM23B expressed in 
non-metastatic cells 2 protein 
NM23B expressed in 
SWI/SNF related matrix 
associated actin dependent 
regulator of chromatin 
subfamily a member 4 
thioredoxin reductase 1 
inositol 1 4 5-triphosphate 
receptor type 1 
inositol 1 4 5-triphosphate 
receptor type 3 
protein kinase C beta 1 
telomeric repeat binding factor : 
Sam68-like phosphotyrosine 
protein T-STAR 
protein kinase C beta 1 



MADH1 


U59423 


81.17 


Above 


GNG11 


U31384 


80.37 




ANXA2 


D00017 


78.68 


Below 


NME2 


X58965 


77.04 


Below 




X58965 


76.35 




SMARCA4 


D26156 


76.35 


Above 


TXNRD1 


X91247 


75.97 


Above 


ITPR1 


D26070 


75.56 


Above 


ITPR3 


U01062 


75.11 


Above 


PRKCB1 


X06318 


73.96 


Above 


> TERF2 


AF002999 


73.84 


Above 


T-STAR 


AF051321 


73.72 


Above 


PRKCB1 


X07109 


73.66 


Above 



2 . Correlation-based Feature Selection (CFS) 
5 The Correlation-based Feature Selection (CFS) is a method that evaluates 

subsets of genes rather than individual genes. (Hall and Holmes 
(2000),"Benchmarking Attribute Selection Techniques for Data Mining," Working 
Paper 00/10, Department of Computer Science, University of Waikato, New Zealand). 
The core of the algorithm is a subset evaluation heuristic that takes into account the 

1 0 usefulness of individual features for predicting the class along with the level of 
intercorrelation among them with the belief that "good feature subsets contain 
features highly correlated with the class, yet uncorrelated with each other". The 
heuristic assigns a score Merit, to a subset S containing k genes, defined as Merit s = 
(k* r C f)/sqrt(k + k * (k - 1) * r ff ), where r cf is the average gene-class correlation and r ff 

15 is the average gene-gene correlation. Like the Chi square method, CFS first 

discretizes the gene expressions into intervals and then calculates a matrix of gene- 
class and gene-gene correlations from the training data for merit calculation. The 
correlation between two genes or a gene and a class is calculated as r xy = 2 * [H(X) + 
H(Y) - H(X,Y)]/[H(X) + H(Y)], where H(X) is the entropy of a gene X. CFS starts 
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from an empty set of genes and uses the best-first search technique with a stopping 
criterion of 5 consecutive fully expanded non-improving subsets. The subset with the 
highest merit found during the search is selected. Tables 9-15 list the top gene subsets 
chosen by CFS for each subtype. For subtype prediction, each gene subset must be 
used in its entirety, as within each subset, all genes are equally ranked. 



Affymetrix 
number 



Table 9. Genes selected by CFS: BCR-ABL 
Gene Name GeneSymbol Reference 



Above/ 
Below 
Mean 



1 3665(Lat 

2 40196_at 

3 1635_at 

4 33775_s_at 

5 1636_g_at 

6 41295_at 

7 1326_at 

8 33150_at 

9 4005 l_at 

10 39061_at 

11 33172_at 

12 37399_at 

13 317_at 

14 330_s_at 



16 39044 js_at 

17 32562_at 



21 36591_at 

22 36035 at 



cyclin D2 
HYA22 protein 

proto-oncogene tyro sine-protein 
kinase (ABL) gene 

caspase 8 apoptosis-related cysteine 
protease 

proto-oncogene tyro sine-protein 
kinase (ABL) gene 

GTT1 protein 

caspase 10 apoptosis-related cysteine 
protease 

disrupter of silencing 10 
TRAM-like protein 
bone marrow stromal cell antigen 2 
hypothetical protein FLJ10849 
aldo-keto reductase family 1 member 
C3 3-alpha hydroxysteroid 
dehydrogenase type II 
protease cysteine 1 legumain 
tubulin, alpha 1, isoform44 

tumor necrosis factor receptor 
superfamily member 7 

diacylglycerol kinase delta 130kD 
endoglin Osler-Rendu-Weber 
syndrome 1 

Homo sapiens mRNA for TSC-22- 
like protein 

CASP2 and RIPK1 domain containin 
adaptor with death domain 

v-abl Abelson murine leukemia viral 
oncogene homolog 1 

tubulin alpha 1 testis specific 
anchor attachment protein 1 Gaalp 
yeast homolog 

-51- 



CCND2 


D13639 


Above 


HYA22 


D88153 


Above 


ABL 


U07563 




CASP8 


X98176 


Above 


ABL 


U07563 


Above 


GTT1 


AL041780 


Above 


CASP10 


U60519 


Above 


SAS10 


AI 126004 


Above 


KIAA0057 


D31762 


Above 


BST2 


D28137 


Above 


FLJ10849 


T75292 


Above 


AKR1C3 


D17793 


Above 


PRSC1 


D55696 


Above 


TUBA1 


HG2259- 


Above 




HT2348 




TNFRSF7 


M63928 


Above 


DGKD 


D73409 


Below 


ENG 


X72012 


Above 




AJ133115 


Above 


gCRADD 


U84388 


Above 


ABL1 


XI 6416 


Above 


TUBA1 


X06956 


Above 


GPAA1 


AB002135 


Above 
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23 980_at 

24 40698_at 



25 39330__s_at 

26 2001_g_at 



27 


39319_at 


28 


37685 at 


29 


33813_at 


30 


33134_at 


31 


36536_at 


32 


36985_at 


33 


3599 l_at 


34 


33774_at 


35 


37470_at 


36 


39245_at 


37 


40076_at 


38 


39370_at 


39 


41594_at 


40 


41338_at 


41 


32319_at 


42 


33924_at 


43 


37397_at 


44 


37190_at 


45 


39070_at 


46 


38994_at 


47 


32621_at 


48 


40108_at 


49 


35238_at 



Niemann-Pick disease type CI NPC1 AF002020 

C-type calcium dependent CLECSF2 X96719 

carbohydrate-recognition domain 
lectin supeifamily member 2 
activation-induced 

actinin alpha 1 ACTN1 M95178 

ataxia telangiectasia mutated includes ATM U26455 
complementation groups A C and D 

lymphocyte cytosolic protein 2 SH2 LCP2 U20 1 5 8 

domain-containing leukocyte protein 

of76kD 

Clathrin assembly lymphoid-myeloid CLTH U45976 
leukemia gene 



tumor necrosis factor receptor 
superfamily member IB 

adenylate cyclase 3 
schwamiomin interacting protein 1 
isopentenyl-diphosphate delta 



TNFRSF1B AI8 13532 



Sm protein F 
caspase 



50 1558_g_at 



leukocyte-associated Ig-like receptor 
1 

Human 40871 mRNA partial 
sequence 

tumor protein D52-like 2 
Microtubule-associated proteins 1 A 
and IB light chain 3 

Janus kinase 1 a protein tyrosine 
kinase 

ammo-terminal enhancer of split 
tumor necrosis factor ligand 
superfamily member 4 tax- 
transcriptionally activated 
glycoprotein 1 34kD 
KIAA1091 protein 
platelet/endothelial cell adhesion 
molecule-1 (PECAM-1) gene 

WAS protein family member 1 
singed Drosophila like sea urchin 
fascin homolog like 

STAT induced STAT inhibitor-2 
down-regulator of transcription 1 
TBP-binding negative cofactor 2 

KIAA0005 gene product 
TNF receptor-associated factor 5 
p21/Cdc42/Racl-activated kinase 1 
yeast Ste20-related 



Above 
Above 



Above 
Above 



ADCY3 


ABO 11083 


Above 


SCHIP-1 


AF070614 


Above 


IDI1 


X17025 


Below 


LSM6 


AA9 17945 


Above 


CASP8 


X98172 


Above 


LAIR1 


AF0 13249 


Above 




U72507 


Above 


TPD52L2 


AF004430 


Below 


MAP1ALC3 


W28807 


Below 


JAK1 


M64174 


Above 


AES 


AI969192 


Below 


TNFSF4 


AL022310 


Above 


KIAA1091 


AB029014 


Above 


PECAM 


L34657 


Above 


WASF1 


D87459 




SNL 


U03057 


Above 


STATI2 


AF037989 


Above 


DR1 


M97388 


Above 


KIAA0005 


D13630 


Below 


TRAF5 


AB000509 


Above 


PAK1 


U24152 


Above 
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51 


1373_at 


transcription factor 3 E2A 


TCF3 


M31523 


Below 






immunoglobulin enhancer binding 












factors E12/E47 






Above 


52 


3573 l_at 


rntegrin alpha 4 antigen CD49D alpha ITGA4 


X16983 






4 subunit of VLA-4 receptor 








53 


38659 at 


suppressor of clear C. elegans 


SHOC2 


AB020669 


Below 






homolog of 












Table 10. Gene selected by CFS for E2A-PBX1 






Affymetrix 


Gene Name 


GeneSymbol 


Reference 


Above/ 










number 


Below 












Mean 


1 


33355_at 


Homo sapiens cDNA FLJ12900 fis 


PBX1 


AL049381 


Above 






clone NT2RP2004321 (by CELERA 












search of target sequence = PBX1) 












Table 11. Genes selected by CFS for: Hyperdiploid >S0 






Affymetrix 


Gene Name 


GeneSymbol 


Reference 


Above/ 




number 






number 


Below 












Mean 


1 


36620_at 


superoxide dismutase 1 soluble 


SOD1 


X02317 


Above 






amyotrophic lateral sclerosis 1 adult 








2 


37350_at 


clone 889N15 on chromosome 


PSMD10 


AL031177 


Above 






Xq22. 1-22.3. Contains part of the 












gene for a novel protein similar to X. 












laevis Cortical Thymocyte Marker 












CTX 








3 


41/ z4 at 


accessory proteins BAP3 1/BAP29 


DXS1357E 


X81109 


Above 


4 


38738_at 


SMT3 suppressor of mif two 3 yeast 


SMT3H1 


X99584 


Above 






homolog 1 








5 


40480_s_at 


FYN oncogene related to SRC FGR 


FYN 


M14333 


Above 






YES 








6 


38518 at 


sex comb on midleg Drosophila like 2 SCML2 


Y18004 


Above 


7 


31492_at 


muscle specific gene 


M9 


AB019392 


B 1 

e ow 


8 


35688_g_at 


mature T-cell proliferation 1 


MTCP1 


Z24459 


Above 


9 


35939_s_at 


POU domain class 4 transcription 


POU4F1 


L20433 


Above 






factor 1 








10 


36128 at 


transmembrane trafficking protein 


TMP21 


L40397 


Above 




37014 at 


myxovirus influenza resistance 1 


MX1 


M33882 


Above 






homolog of murine interferon- 












inducible protein p78 








12 


34374_g at 


upstream regulatory element binding 


UREB1 


Z97054 


Above 






protein 1 








13 


688_at 


proteasome prosome macropain 26S 


PSMC1 


L02426 


Above 






subunit ATPase 1 








14 


39878_at 


protocadherin 9 


PCDH9 


AI524125 


Below 


15 


3877 l_at 


histone deacetylase 1 


HDAC1 


D50405 


Below 
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16 


865_at 


ribosomal protein S6 kinase 90kD 


RPS6KA3 


U08316 


Above 






polypeptide 3 








17 


41143_at 


calmodulin (CALM1) gene 




U 12022 


Above 


18 


39867_at 


Tu translation elongation factor 


TUFM 


S75463 


Below 






mitochondrial 






Above 


19 


41470_at 


prominin mouse like 1 


PROML1 


AF027208 


20 


41503_at 


KIAA0854 protein 


KIAA0854 


AB020661 


Below 


21 


2039_s_at 


FYN oncogene related to SRC FGR 


FYN 


M14333 


Above 






YES 






Above 


22 


36845_at 


JsJAAUIjO proiem 


KIAA0136 


D50926 


23 


36940_at 


TGKB1 -induced anti-apoptotic factor 


TIAF1 


D86970 


Above 


24 


32236_at 


i 

ubiquitin-conjugating enzyme E2G 2 


UBE2G2 


AF032456 


Above 






homologous to yeast UBC7 








25 


36885_at 


spleen tyrosine kinase 


SYK 


L28824 


Below 


26 


40200_at 


heat shock transcription factor 1 


HSF1 


M64673 


Below 


27 


40842_at 


Ul snRNP-specific protein A gene 


SNRPA 


M60784 




28 


40514 at 


hypothetical 43.2 Kd protein 


LOC51614 


AF091085 


Below 


29 


_at 


signal transducer and activator of 


STAT6 


AF067575 


Below 






transcription 6 (STAT6) gene 








30 


1294_at 


ubiquitin-activating enzyme El -like 


UBE1L 


L13852 


Below 


31 


34315_at 


AFG3 ATPase family gene 3 yeast 


AFG3L2 


Y18314 


Above 






like 2 






Above 


32 


39806_at 


DKFZP547E2110 protein 


DKFZP547E21 AL050261 


33 


40875_s_at 


small nuclear ribonucleoprotein 70kD SNRP70 


X06815 


Below 






polypeptide RNP antigen 








34 


38458_at 


cytochrome b5 (CYB5) gene 


CYB5 


L39945 


Above 


35 


1817_at 


prefoldin 5 


PFDN5 


D89667 


Below 


36 


34709_r_at 


stromal antigen 2 


STAG2 


Z75331 




37 


33447_at 


myosin light polypeptide regulatory 


MLCB 


X54304 


Above 






non-sarcomeric 20kD 








38 


1077_at 


recombination activating gene 1 


RAG1 


M29474 


Below 




1915 

_s_a 


v-fos FBJ murine osteosarcoma viral 
oncogene homolog 


FOS 


V01512 


Above 


40 


38854_at 


KIAA0635 gene product 


KIAA0635 


AB014535 


Above 


41 


37732_at 


RING1 and YY1 binding protein 


RYBP 


AL049940 


Above 


42 


35940_at 


POU domain class 4 transcription 


POU4F1 


X64624 


Above 






factor 1 








43 


34733_at 


splicing factor 3a subunit 1 120kD 


SF3A1 


X85237 


Below 


44 


245_at 


selectin L lymphocyte adhesion 


SELL 


M25280 


Below 


45 


40146_at 


molecule 1 

RAP IB member of RAS oncogene 


RAP IB 


AL080212 


Below 


46 


40104_at 


iamiiy 

serine/threonine kinase 25 Ste20 yeast STK25 


D63780 


Below 






homolog 








47 


430_at 


nucleoside phosphorylase 


NP 


X00737 


Above 
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48 


36899_at 


special AT-rich sequence binding 


SATB1 


M97287 


Below 






protein 1 binds to nuclear 












matrix/scaffold-associating DNA s 








49 


35727 at 


hypothetical protein FLJ20517 


FLJ20517 


AI249721 


Below 


50 


38649_at 


KIAA0970 protein 


KIAA0970 


AB023187 


Below 


51 


36107_at 


ATP synthase H transporting 


ATP5J 


AA845575 








mitochondrial F0 complex subunit F6 








52 


38789_at 


transketolase Wernicke-Korsakoff 


TKT 


L12711 


Below 






syndrome 






Below 


53 


39301_at 


calpain 3 p94 


CAPN3 


X85030 


54 


41278_at 


BAF53 


BAF53A 


AF041474 


Below 


55 


41162_at 


protein phosphatase 1G formerly 2C 
magnesium-dependent gamma 


PPM1G 


Y13936 


Below 






isoform 






Below 


56 


37819_at 


hypothetical protein 


LOC54104 


AF007130 


57 


38717_at 


DKFZP586A0522 protein 


DKFZP586 


AL050159 


Below 








A0522 






58 


40019__at 


ecotropic viral integration site 2B 


EVI2B 


M60830 


Above 


59 


39489_g_at 


protocadherin 9 


PCDH9 


W27720 


Below 


60 


857_at 


protein phosphatase 1 A formerly 2C 


PPM1A 


S87759 


Above 






magnesium-dependent alpha isoform 








61 


32804_at 


RNA binding motif protein 5 


RBM5 


AF091263 


Below 


62 


37676_at 


phosphodiesterase 8A 


PDE8A 


AF056490 


Below 


63 


1519_at 


v-ets avian erythroblastosis virus E26 


ETS2 


J04102 








oncogene homolog 2 








64 


37680_at 


A kinase PRKA anchor protein gravin AKAP12 


U81607 


Below 


65 


548_s_at 


1Z 

spleen tyrosine kinase 


SYK 


S80267 


Below 


66 


39797_at 


KIAA0349 protein 


KIAA0349 


AB002347 


Above 


67 


32789_at 


nuclear cap binding protein subunit 2 


NCBP2 


AA149428 


Below 


68 


38091__at 


20kD 

lectin galactoside-binding soluble 9 


LGALS9 


Z49107 
24910 


Below 






galectin 9 






Below 


69 


41223_at 


cytochrome c oxidase subunit Va 


COX5A 


M22760 


70 


933_f_ at 


zinc finger protein 91 HPF7 HTF10 


ZNF91 


LI 1672 


Below 


71 


37012_at 


capping protein actin filament muscle 


CAPZB 


U03271 


Below 






Z-line beta 








72 


35214_at 


UDP-glucose dehydrogenase 


UGDH 


, AF061016 


Above 


73 


32434_at 


myristoylated alanine-rich protein 


MACS 


' D10522 


Above 






kinase C substrate MARCKS 80K-L 








74 


38345_at 


centrosomal protein 1 


CEP1 


AF083322 


Below 


75 


40404_s_at 


CDC16 cell division cycle 16 S. 


CDC16 


U18291 








cerevisiae homolog 








76 


39096_at 


SON DNA binding protein 


SON 


AB028942 


Above 


77 


33429_at 


DKFZP586M1523 protein 


DKFZP586M1 AL050225 


Above 


78 


40641_at 


TBP-associated factor 172 


TAF-172 


AF038362 


Above 


79 


4138 l_at 


KIAA0308 protein 


KIAA0308 


AB002306 


Below 
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84 38792_at 

85 32643_at 



Homo sapiens Similar to CGI 5084 
gene product clone MGC 1047 1 
mRNA complete cds 



runt-related transcription factor 1 RUNX 1 

acute myeloid leukemia 1 amll 

oncogene 

caspase 4 apoptosis-related cysteine CASP4 
protease 

primase polypeptide 2 A 5 8kD PRIM2A 
spermine synthase SMS 
glucan 1 4-alpha- branching enzyme 1 GBE1 
glycogen branching enzyme Andersen 
disease glycogen storage disease type 
IV 



U28014 

X74331 

AD001528 

L07956 



Below 

Above 
Above 
Below 



86 38808_at cell membrane glycoprotein 11 00O0M GP 110 



87 36062_at 

88 300 f at 



89 1979_s_at 

90 32230_at 



92 3465 l_at 

93 1052_s_at 



94 36272_r_at 

95 2044_s_at 



Leupaxin LPXN AJfU02U/5 
transcription factor BTF3 homolog HG45 1 8- 

(GBM90355) HT4921 

nucleolar protein 1 120kD NOLI X55504 

eukaryotic translation initiation factor EIF3S2 U39067 
3 subunit2beta36kD 

guanine nucleotide binding protein G GNG7 ABO 104 14 

protein gamma 7 

catechol-O-methyltransferase COMT M58525 

CCAAT/enhancer binding protein CEBPD M83667 
C/EBP delta 

peripheral myelin protein 2 PMP2 X62 1 67 

retinoblastoma 1 including RBI M15400 
osteosarcoma 

sterol regulatory element binding SREBF 1 U00968 
transcription factor 1 



Below 
Below 



Below 
Below 



Above 
Below 



Below 
Below 



Affymetrix 
number 



Table 12. Genes selected by CFS for MLL 
Gene Name GeneSymbol Reference 



34306_at 
40797 at 



32193_at 
40518_at 



muscleblind Drosophila like MBNL 
a disintegrin and metalloproteinase ADAM10 
domain 10 

LGALS1 Lectin, galactoside-binding, LGALS1 
soluble, 1 (galectin 1) 

S100 calcium-binding protein A10 S100A10 
annexin II ligand calpactin I light 
polypeptide pi 1 

insulin-like growth factor binding IGFBP7 
protein 7 

plexin CI PLXNC1 
protein tyrosine phosphatase receptor PTPRC 

-56- 



AB 007888 
AF009615 



AF030339 
Y00062 



Above/ 
Below 
Mean 

Above 
Above 



Above 
Above 
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8 


36777_at 


DNA segment on chromosome 12 


D12S2489E 


AJ001687 


Above 






unique 2489 expressed sequence 








9 


38391_at 


capping protein actin filament 


CAPG 


M94345 


Above 






gelsolin-like 






Above 


10 


40763 at 


Meisl mouse homolog 


MEIS1 


U85707 


11 


34721_at 


FK506-binding protein 5 




U4203 1 


Above 


12 


37809_at 


homeo box A9 


HOXA9 


U41813 


Above 
Ab 0 ^ 


13 


32215_i_at 


KIAA0878 protein 


KIAA0878 


AB020685 




14 


38160_at 


lymphocyte antigen 75 


LY75 


AF011333 


Above 


15 


1389 at 


membrane metallo-endopeptidase 


MME 


J03779 


Below 






neutral endopeptidase enkeplialinase 












CALLA CD10 








16 


34168_at 


deoxynucleotidyltransferase termina 


DNTT 


Ml 1722 


Below 


17 


40522_at 


glutamate-ammorda ligase glutamme 


GLUL 


X59834 








synthase 






Above 


18 


854_at 


B lymphoid tyrosine kinase 




S76617 


19 


40067_at 


E74-like factor 1 ets domain 


ELF1 


M82882 


Above 






transcription factor 








20 


39756_g_at 


X-box binding protein 1 




Z93930 


Below 


21 


32134_at 


Testing 


DKFZP586 


AL050162 


Above 








B2022 






22 


39379_at 


Homo sapiens mRNA cDNA 




AL049397 


Above 






DKFZp586C1019 from clone 












DKFZp586C1019 








23 


40415_at 


acetyl-Coenzyme A acyltransferase 1 


ACAA1 


X14813 


Above 






peroxisomal 3-oxoacyl-Coenzyme A 












thiolase 








24 


40519_at 


protein tyrosine phosphatase receptor 


PTPRC 


Y00638 


Above 






typeC 






Above 


25 


33847_s_at 


cyclin-dependent kinase inhibitor IB 


CDKN1B 








p27 Kipl 








26 


32696_at 


pre-B-cell leukemia transcription 


PBX3 


X59841 


Above 






factor 3 






Above 


27 


40417_at 


KIAA0098 protein 




D43950 


28 


1644_at 


eukaryotic translation initiation factor EIF3S2 


U36764 


Above 






3 subunit2beta36kD 








29 


948_s_at 


peptidylprolyl isomerase D 


PP1D 


D63861 


Above 






cyclophilin D 






Below 


30 


34337_s_at 


putative DNA binding protein 


M96 


AJ010014 


31 


41747_s_at 


myocyte-specific enhancer factor 2A 


MEF2A 




Above 






(MEF2A)gene 








32 


39516_at 


hypothetical protein 


HSPC004 


AI827793 


Above 


33 


31820_at 


hematopoietic cell-specific Lyn 


HCLS1 


X16663 


Above 






substrate 1 








34 


33305_at 


serine or cysteine proteinase inhibitor 


SERPINB1 


M93056 


Above 






clade B ovalbumin member 1 








35 


40520_g_at 


protein tyrosine phosphatase receptor 


PTPRC 


Y00638 


Above 






typeC 
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36 


41222_at 


signal transducer and activator of 


STAT6 


AF067575 


Above 






transcription 6 (STAT6) gene 








37 


1718_at 


actin related protein 2/3 complex 


ARPC2 


U50523 


Above 






subunit 2 34 kD 








38 


383 _at 


KIAA0239 protein 


KIAA0239 


D87076 


Below 


39 


38805_at 


TG- interacting factor TALE family 


TGEF 


X89750 


Below 


40 


32089_at 


liomeobox 

sperm associated antigen 6 


SPAG6 


AF079363 


Above 


41 


1950_s_at 


Smad3, exon 1 




AB004922 


Above 


42 


39410_at 


development and differentiation 


DDEF2 


ArSUU/SOU 


Above 






enhancing factor 2 








43 


37280_at 


MAD mothers against 


MADH1 


U59912 


Below 






decapentaplegic Drosophila homolog 








44 


32607_at 


1 

brain acid-soluble protein 1 


BASP1 


AF039656 


Above 


45 


39389_at 


CD9 antigen p24 


CD9 


M38690 


Below 


46 


40913_at 


ATPase Ca transporting plasma 


ATP2B4 


W28589 


Below 






membrane 4 








47 


1039_s_at 


hypoxia-inducible factor 1 alpha 


HIF1A 


U22431 


Below 






subunit basic helix-loop-helix 












transcription factor 






Below 


48 


35939_s_at 




POU4F1 


L20433 






factor 1 






Below 


49 


963_at 


ligase IV DNA ATP-dependent 


LIG4 


X83441 


50 


39628_at 


RAB9 member RAS oncogene family 


RAB9 








38242 at 




SLP65 


AF068180 


Below 


52 


37692_at 


diazepam binding inhibitor GAB A 


DBI 


AI557240 


Above 






receptor modulator acyl-Coenzyme A 












binding protein 






Above 


53 


32166 at 


KIAA1027 protein 


KIAA1027 


AB028950 


54 


34800 at 


DKFZP58601624 protein 


DKFZP586016 AL039458 


Below 


55 


34386_at 


methyl-CpG binding domain protein 4 MBD4 


AF072250 


Below 


56 


40296 at 


hypothetical protein 


753P9 


AL023653 


Below 


57 


40456_at 


up-regulated by BCG-CWS 


LOC64116 


AL049963 


Above 


58 


33943_at 


ferritin heavy polypeptide 1 


FTH1 


L20941 


Below 


59 


39049_at 


G18.1a and G18. lb proteins (G18.1a 




AJ243937 


Below 






and G18. lb genes, located in the class 












III region of the major 












histocompatibility complex) 






Above 


60 


38075_at 


synaptophys in-like protein 


SYPL 


X68194 


61 


932_i_at 


zinc finger protein 91 HPF7 HTF10 


ZNF91 


LI 1672 


Below 


62 


1825_at 


IQ motif containing GTPase 


IQGAP1 


L33075 


Above 






activating protein 1 








63 


34210_at 


CDW52 antigen CAMPATH-1 


CDW52 


N90866 


Below 






antigen 






Below 


64 


39778_at 


mannosyl alpha- 1 3- glycoprotein 


MGAT1 


M55621 






beta-1 2-N- 












acetylglucosaminyltransferase 








65 


34699_at 


CD2-associated protein 


CD2AP 


AL050105 


Below 
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66 


40066_at 


ubiquitin-activating enzyme E1C 


UBE1C 


AF046024 


Above 






homologous to yeast UBA3 








67 


41177_at 


hypothetical protein FLJ12443 


FLJ12443 


AW024285 


Above 


68 


32736_at 


HSPC022 protein 


HSPC022 


W68830 


Above 


69 


1928_s_at 


mad protein homolog Smad2 gene 


Smad2 


U78733 


Below 


70 


1081_at 


ornithine decarboxylase 1 


ODC1 


M33764 


Above 


71 


37345_at 


Calumenin 


CALU 


AF013759 




72 


34099 f at 


nucleosome assembly proteml^e 1 


NAP1L1 


W26056 


Above 


73 


933_fat 


zinc finger protein 91 HPF7 HI r 10 




LI 1672 


Slow 


74 


32214_at 


thioredoxin-like 32kD 


TXNL 


AF003938 




75 


33501_r_at 


SNC73 protein SNC73 mKNA 




S71043 


Below 


76 


950_at 


complete cds 
translocation protein 1 


TLOC1 


D87127 


Below 


77 


41161 at 


death-associated protein 6 


DAXX 


ABO 15051 


Below 


78 


41381_at 


KIAA0308 protein 


KIAA0308 


AB002306 


Below 


79 


38705_at 


ubiquitin-conjugating enzyme E2D 2 


UBE2D2 


AI3 10002 


Above 






homologous to yeast UBC4/5 








80 


38617_at 


LIM domain kinase 2 


LIMK2 


D45906 


Below 


81 


34305_at 


poly rC binding protein 1 


PCBP1 


Z29505 


Above 


82 


40436_g_at 


solute earner family 25 mitochondrial SLC25A6 


J03592 


Above 




carrier adenine nucleotide translocator 








1827 s at 
_s_at 


memoer o 

c-myc-P64 mRNA, initiating from 




M13929 


Above 






promoter P0 








84 


38479 at 


acidic protein rich in leucines 


SSP29 


Y07969 


Below 


85 


33207 at 


DnaJ Hsp40 homolog subfamily C 


DNAJC3 


AI095508 


Below 






member 3 








86 


39039_s_at 


CGI-76 protein 


LOC51632 


AI557497 


Below 


87 


32157 at 


protein phosphatase 1 catalytic 


PPP1CA 


S57501 


Above 






subunit alpha isoform 








88 


905_at 


guanylate kinase 1 


GUK1 


L76200 


Below 


89 


35794_at 


KIAA0942 protein 


K1AA0942 


AB023159 


Below 


90 


1007_s_at 


discoidin domain receptor family 


DDR1 


U48705 


Below 






member 1 








91 


39424_at 


tumor necrosis factor receptor 


TNFRSF14 


U70321 


Below 






superfamily member 14 herpesvirus 












entry mediator 








92 


36634_at 


BTG family member 2 


BTG2 


U72649 


. 

e ow 


93 


38760_f_at 


butyrophilin subfamily 3 member A2 


BTN3A2 


U90546 


Below 






Table 13. Genes selected by CFS for Novel Class 






Affymetrix 
number 


Gene Name 


GeneSymbol 


Reference 


Above/ 








number 














Mean 


1 


37960_at 


carbohydrate chondroitin 6/keratan 
sulfotransferase 2 


CHST2 


AB014679 


Above 


2 


31892_at 


protein tyrosine phosphatase receptor 


PTPPvM 


X58288 


Above 






typeM 
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3 


994 at 


protein tyrosine phosphatase receptor PTPRM 


X58288 


Above 






type M 








4 


995_g_at 


protein tyrosine phosphatase receptoi 


PTPRM 


X58288 


Above 






type M 








5 


41074 at 


G protein-coupled receptor 49 


GPR49 


AF062006 


Above 


6 


41073_at 


G protein-coupled receptor 49 


GPR49 


AI743745 


Above 


7 


34676 at 


KIAA1099 protein 


KIAA1099 


AB029022 


ove 


8 


36139_at 


DKFZP586G0522 protein 


DKFZP586G05 AL050289 


Above 


9 


37542_at 




LHFPL2 


D86961 


Above 






clathrin heavy polypeptide He 


CLTC 


D21260 


Above 


11 


32800_at 


retinoid X receptor alpha mRNA 




U66306 


Above 


12 


1664_at 


insulin-like growth factor 2 


IGF2 


HG3543- 


Above 










HT3739 




13 


36566_at 


cystinosis nephropathic 


CTNS 


AJ222967 


Above 






Table 14. Gene selected by CFS for T-ALL 






Affymetrix 


Gene Name 


GeneSymbol 


Reference 


Above/ 




number 






number 


Below 












Mean 


1 


38319 at 


CD3D antigen delta polypeptide TiT3 CD3D 


AA919102 


Above 






complex 












Table 15. Genes selected by CFS for TEL-AML1 






Affymetrix 


Gene Name 


GeneSymbol 


Reference 


Above/ 




number 






number 


Below 












Mean 




38652 


hypothetical protein FLJ20154 


FLJ20154 


AF070644 


Above 


2 


36239_at 


POU domain class 2 associating 


POU2AF1 


Z49194 


Above 






factor 1 








3 


41442_at 


core-binding factor runt domain alpha 


CBFA2T3 


AB010419 


Above 






subunit 2 translocated to 3 








4 


37780_at 


piccolo presynaptic cytomatrix 


PCLO 


AB011131 


Above 






protein 








5 


36985 at 


isopentenyl-diphosphate delta 


IDI1 


X17025 


Above 






isomerase 








6 


38578_at 


tumor necrosis factor receptor 


TNFRSF7 


M63928 


Above 






superfamily member 7 








7 


35614_at 


transcription factor-like 5 basic helix- TCFL5 


AB012124 


Above 






loop-helix 






8 


32224_at 


KIAA0769 gene product 


KIAA0769 


AB018312 


Above 


9 


32730_at 


K1AA1750 protein 




AL080059 


Above 


10 


36937_s_at 


PDZ and LIM domain 1 elfin 


PDLIM1 


U90878 


Below 


11 


36008_at 


protein tyrosine phosphatase type IVA PTP4A3 
member 3 


AF041434 


Above 


12 


41200_at 


CD36 antigen collagen type I receptor CD36L1 


Z22555 


Above 






thrombospondin receptor like 1 
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13 


33690 at 


DKFZp434A202 from clone 




AL080190 








DKFZp434A202 








14 


755 at 


inositol 1 4 5 -triphosphate receptor 


ITPR1 


D26070 


Above 






type 1 








15 


41097_at 


telomeric repeat binding factor 2 


TERF2 


AF002999 


Above 


16 


160029_at 


protein kinase C beta 1 


PRKCBl 


X07109 


Above 


17 


3448 l_at 


vav proto-oncogene 


Vav 


AF030227 


Above 


18 


41498_at 


KIAA09 11 protein 


K1AA0911 


AB020718 


Above 


19 


37280_at 


MAD mothers against 


MADH1 


U59912 


Above 






1 P P 81 ^ S 








20 


1647_at 


IQ motif containing GTPase 


IQGAP2 


U51903 


Below 






activating protein 2 








21 


37724_at 


v-myc avian myelocytomatosis viral 


MYC 


V00568 


Below 






oncogene homolog 








22 


37981_at 


drebrin 1 


DBN1 


U00802 


Above 


23 


37326_at 


proteolipid protein 2 colonic 


PLP2 


U93305 


Below 






epithelium-enriched 








24 


37344_at 


major histocompatibility complex 


HLA-DMA 


X62744 


Above 






class II DM alpha 








25 


38666_at 


pleckstrin homology Sec7 and 


PSCD1 


M85169 


Below 






coiled/coil domains 1 cytohesin 1 








26 


39039_s_at 


CGI-76 protein 


LOC51632 


AI557497 


Below 


27 


34819_at 


CD 164 antigen sialomucin 


CD 164 


D 14043 


Below 


28 


40729_s_at 


nuclear factor of kappa light 


NFKBIL1 


Y14768 


Above 


















inhibitor-like 1 








29 


34224_at 


fatty acid desaturase 3 


FADS3 


AC004770 


Above 


30 


39827 at 




FLJ20500 


AA522530 


Below 


31 


32157_at 


protein phosphatase 1 catalytic 


PPP1CA 


S57501 


Below 






subunit alpha isoform 








32 


34183_at 


DKFZP434C171 protein 


DKFZP434C17 AL080169 


Below 


33 


39329 at 


actinin alpha 1 


ACTN1 


X15804 


Below 


34 


38124_at 


midkine neurite growth-promoting 


MDK 


X55110 


Above 






factor 2 








35 


33304_at 


interferon stimulated gene 20kD 


ISG20 


U88964 


Above 


36 


41295_at 


GTT1 protein 


GTT1 


AL041780 


Below 


37 


40745_at 


adaptor-related protein complex 1 


AP1B1 


L13939 


Above 






beta 1 subunit 








38 


38906_at 


spectrin alpha erythrocytic 1 


SPTA1 


M61877 


Above 






elliptocytosis 2 








39 


263_g_at 


S-adenosylmetbionine decarboxylase 
1 


AMD1 


M21154 


Below 


40 


41609_at 


major histocompatibility complex 


HLA-DMB 


U15085 


Above 






class II DM beta 








41 


39045_at 


hypothetical protein FLJ21432 


FLJ21432 


W26655 


Below 
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42 


3942 l_at 


runt-related transcription factor 1 


RUNX1 


D43969 


Above 






acute myeloid leukemia 1 amll 












oncogene 








43 




CD W52 antigen C AMP ATH- 1 


CDW52 


N90866 


Above 






antigen 








44 


37276_at 


IQ motif containing GTPase 


IQGAP2 


U51903 


Below 






activating protein 2 








45 


38763_at 


L-iditol-2 dehydrogenase gene 




L29254 


Below 


46 


40960_at 


UDP-Gal betaGlcNAc beta 1 4- 


B4GALT1 


D29805 


Below 






galactosyltransferase polypeptide 1 








47 


1127_at 


ribosomal protein S6 kinase 90kD 


RPS6KA1 


L07597 


Below 






polypeptide 1 








48 


37359^at 


KIAA0102 gene product 


KIAA0102 


D14658 


Below 


49 


38968_at 


SH3-domain binding protein 5 BTK- 


SH3BP5 


AB005047 


Below 






associated 








50 


39135 at 


KIAA0767 protein 


JvlAAU/O / 


AB018310 


Below 


51 


36128~at 


transmembrane trafficking protein 


TMP21 


L40397 


Below 




8_s_at 


calmodulin 3 phosphorylase kinase 


CALM3 


J04046 


Above 






^ elta .. 








53 


34782_at 


jumonji mouse homolog 


JMJ 


AL021938 


Below 


54 


37893_at 


protein tyrosine phosphatase non- 


PTPN2 


AI828880 


Below 






receptor type 2 










3V / jo_l_ai 


Lysosomal-associated membrane 


LAMP1 


J04182 


Below 






protein 1 








56 


35151_at 


tumor suppressor deleted in oral 


DOC-1R 


AF089814 


Below 






cancer-related 1 








57 


38096_f_at 


major histocompatibility complex 


HLA-DPB 1 


M83664 


Above 






class II DP beta 1 








58 


40467_at 


succinate dehydrogenase complex 


SDHD 


AB006202 


Below 






subunit D integral membrane protein 








59 


39712_at 


SI 00 calcium-binding protein A13 


S100A13 


AI541308 


Below 


60 


41812 s at 


KIAA0906 protein 


KIAA0906 


AB020713 


Below 


61 




lysyl-tRNA synthetase 


KARS 


D32053 


Below 


62 


38336 at 


KIAA1013 protein 


KIAA1013 


AB023230 


Below 


63 


32253_at 


arginine-glutamic acid dipeptide RE 


RERE 


AB007927 


Below 






repeats 








64 


3573 l_at 


integrin alpha 4 antigen CD49D alpha ITGA4 


X16983 


Below 






4 subunit of VLA-4 receptor 








65 


40698_at 


C-type calcium dependent 


CLECSF2 


X96719 


Below 






carbohydrate-recognition domain 












lectin superfamily member 2 












activation-induced 








66 


840 at 


zinc finger protein 220 


ZNF220 






67 


41171_at 


proteasome prosome macropain 


PSME2 


D45248 


Above" 






activator subunit 2 PA28 beta 








68 


34877_at 


Janus kinase 1 a protein tyrosine 
kinase 


JAK1 


AL039831 


Above 


69 


37190_at 


WAS protein family member 1 


WASF1 


D87459 


Below 


70 


31690_at 


Glutamate dehydrogenase-2 


GLUD2 


U08997' 


Below 



WO 03/083140 
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71 


40961_at 


SWI/SNF related matrix associated 


SMARCA2 


X72889 


Below 






actin dependent regulator of 












chromatin subfamily a member 2 








72 


38149_at 


KIAA0053 gene product 


KIAA0053 


D29642 


Above 


73 


2061_at 


integrin alpha 4 antigen CD49D alpha ITGA4 


L12002 


Below 






4 subunit of VLA-4 receptor 








74 


2012_s_at 


protein kinase DNA-activated 


PRKDC 


U34994 


Below 






catalytic polypeptide 








75 


36878_fat 


major histocompatibility complex 


HLA-DQB1 


M60028 


Above 






class II DQ beta 1 








76 


34821_at 


DKFZP586D0623 protein 


DKFZP586D06 AL050197 


Below 


77 


36980_at 


proline-rich protein with nuclear 
taraetins signal 


B4-2 


U03105 


Below 


78 


853_at 


nuclear factor erythroid-derived 2 like NFE2L2 


S74017 


Below 


79 


39320_at 


caspase 1 apoptosis-related cysteine 


CASP1 


U13697 


Below 






protease interleukin 1 beta convertase 








80 


32572_at 


ubiquitin specific protease 9 X 


USP9X 


X98296 


Below 






chromosome Drosophila fat facets 












related 








81 


387_at 


cyclin-dependent kinase 9 CDC2- 


CDK9 


X80230 


Below 






related kinase 








82 


35300_at 


glutamyl-prolyl-tRNA synthetase 


EPRS 


X54326 


Below 


83 


36155_at 


KIAA0275 gene product 


KIAA0275 


D87465 


Below 


84 


37625_at 


Interferon regulatory factor 4 


IIRF4 


U52682 


Below 


85 


35763_at 


K1AA0540 protein 


KIAA0540 


AB011112 


Below 


86 


39077_at 


DRl-associated protein 1 negative 


DRAP1 


U41843 


Below 






cofactor 2 alpha 








87 


40132_g_at FoUistatin-like 1 


FSTL1 


D89937 


Below 


88 


32615_at 


aspartyl-tRNA synthetase 


DARS 


J05032 


Below 


89 


38357_at 


Homo sapiens mRNA cDNA 




AL049321 


Above 






DKFZp564D156 from clone 












DKFZp564D156 








90 


34817_s_at 


ataxin 2 related protein 


A2LP 


U70671 


Above 


91 


40856_at 


serine or cysteine proteinase inhibitor 


SERPINF1 


U29953 


Below 






clade F alpha-2 antiplasmin pigment 












epithelium derived factor member 1 








92 


39784_at 


eukaryotic translation initiation factor 


EIF2S1 


U26032 


Below 






2 subunit 1 alpha 35kD 








93 


37600_at 


extracellular matrix protein 1 


ECM1 


U68186 


Below 


94 


40839„at 


ubiquitin-like 3 


UBL3 


AL080177 


Below 


95 


34832_s_at 


KIAA0763 gene product 


KIAA0763 


AB018306 


Below 


96 


33244_at 


chimerin chimaerin 2 


CHN2 


U07223 


Below 


97 


31516_f_at 


basic transcription factor 3 like 1 


BTF3L1 


M90354 


Below 


98 


35266_at 


bladder cancer associated protein 


BLCAP 


AL049288 


Above 
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99 


253_g_at 


(clone GPCR W) G protein-linked 




L42324 


Below 






receptor gene (GPCR) gene 








100 


35227_at 


retinoblastoma-binding protein 8 


RBBP8 


U72066 


Below 


101 


41073_at 


G protein-coupled receptor 49 


GPR49 


AI743745 




102 


38084_at 


chromobox homolog 3 Drosophila 


CBX3 


AI797801 


Below 






HP1 gamma 








103 


39025_at 


6.2 kd protein 


LOC54543 


AI557912 


Below 


104 


32085_at 


KIAA0981 protein 


KIAA0981 


AB023198 


Above 


105 


38902_r_at 


Activating transcription factor 2 


ATF2 


X15875 


Below 



3. T-statistics 

T-statistics is a classical feature selection approach. The t-statistics of a gene is 
5 defined as T = |u.i - u. 2 l/sqrt(cri 2 /ni + a 2 2 /n 2 ), where u,j is the mean expression of that 
gene in the i th class, ct; 2 is die variance of that gene in the i th class and nj is the size of 
the i th class. This formula assigns higher value to a gene that has larger mean 
difference between two classes and has smaller variance within both classes. For 
BCR-ABL, hyperdiploid >50, MLL, Novel, and TEL-AML1 the top ranked 40 genes 

10 are listed in Tables 16, 18, 19, 20, and 22, whereas for E2A-PBX1 and T-ALL only 
the top 30 and 3 1 genes are shown. Additional genes that may be used in expression 
profiles to assign subjects to a leukemia risk group are shown in Tables 54-60. The 
genes in Tables 54-60 were selected on the basis of having a T-statistic value greater 
than the T-statistic vlaue for the gene when examined as a disciminator in 999 of 1000 

15 permutations of the data set (p<0.001; this statistical test is described elsewhere 
herein). Of these genes, only those having a T-statistic absolute values equal to or 
greater than 8 (representing a nominal p value of -O.0001) are shown in Tables 54- 
50. 

Generally, using the top 20-40 genes did not result in significant changes to 
20 subtype prediction accuracy. Accordingly, the top 20 genes were used for subtype 
prediction, unless noted otherwise. 
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Table 16. Genes Selected by T statistics for BCR-ABL 





Affymetrix 
number 






Reference 


T-stat 


Above/ 
Below 




Gene JName 


Symbol 


number 




Mean 


1 


32319_at 


tumor necrosis factor ligand 


TNFSF4 


AL022310 


12.0346 


Above 






superfamily member 4 tax- 
trans criptionally activated 
glycoprotein 1 34kD 










2 


36194_at 


low density lipoprotein-related 
protein-associated protein 1 alpha- 
2-macroglobulin receptor- 


LRPAP1 


M63959 


-11.3077 








associated protein 1 








Above 


3 


1211_s_at 


CASP2 and RIPK1 domain 
containing adaptor with death 


CRADD 


U84388 


10.6627 






domain 








Above 


4 


37397^_at 


Homo sapiens platelet/endothelial 
cell adhesion molecule- 1 
(PECAM-1) gene, exon 16 and 


PECAM 


L34657 


10.2460 






complete cds. 








Above 


5 


330_s_at 


tubulin, alpha 1, isoform44 


TUBA1 


HG2259- 


10.0540 








HT2348 






6 


33774_at 


caspase 8 apoptosis-related 


CASP8 


X98172 


9.9147 


Above 


7 


202_at 


cysteine protease 

heat shock transcription factor 2 


HSF2 


M65217 






8 


1558_j_at 


p21/Cdc42/Racl-activated kinase 


PAK1 


U24152 


9.6562 


AW 




1 yeast Ste20-related 










9 


3969 l_at 


SH3-containing protein SH3GLB1 SH3GLB1 










2045_s_at 


hemopoietic cell kinase 


HCK 


M16592 


-9.3898 


Below 


11 


3659 l_at 


tubulin alpha 1 testis specific 


TUBA1 


X06956 


9.3382 


Above 


12 


1386_at 


protein tyrosine phosphatase non- 
receptor type 9 


PTPN9 


M83738 


-9.2414 




13 


3599 l_at 


Sm protein F 


LSM6 


AA917945 


9.0298 


Above 


14 


41273_at 


FK506 binding protein 12- 
rapamycin associated protein 1 


FRAP1 


AL046940 






15 


35970 g_at M-phase phosphoprotein 9 


MPHOSPHS 


> N23137 


8.6474 


Above 


16 


38636_at 


immunoglobulin superfamily 
containing leucine-rich repeat 


ISLR 


AB003184 


8.4291 


Above 


17 


36683_at 


matrix Gla protein 


MGP 


AI953789 


-8.3872 


Below 


18 


39070_at 


singed Drosophila like sea urchin 
fascin homolog like 


SNL 


U03057 


8.2583 


Above 


19 


40798_s_at 


a disintegrin and metalloproteinas 
domain 10 


e ADAM10 


Z48579 


8.2283 


Above 


20 


41649_at 


FOXJ2 forkhead factor 


LOC55810 


AF038177 


8.2275 


Above 


21 


38966_at 


glycoprotein synaptic 2 


GPSN2 


AF038958 


8.2080 


Above 


22 


34759_at 


Human hbc647 mRNA sequence 




U68494 


8.1863 


Above 


23 


1434_at 


phosphatase and tensin homolog 
mutated in multiple advanced 


PTEN 


U92436 


8.1671 


Above 
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36 



40167_s_at 

40264_g_at 

36129_at 

551_at 

38345_at 

41137_at 



38160_at 
34314_at 

39519_at 
32788_at 
34882_at 
2064_g_at 



37 41836_at 

38 1563_s_at 



39 37047_at 

40 32724_at 



CS box-containing WD protein 
zinc finger protein-like 1 
KIAA0397 gene product 
E1A binding protein p300 
centrosomal protein 1 
myosin phosphatase target subunit 
2 

protein phosphatase 2 regulatory 
subunit B B56 delta isoform 

lymphocyte antigen 75 
ribonucleotide reductase Ml 
polypeptide 
K1AA0692 protein 
RAN binding protein 2 
nucleolar protein KKE/D repeat 
excision repair cross- 
complementing rodent repair 
deficiency complementation group 
5 

protein with polyglutamine repeat 
calcium ca2 homeostasis 
endoplasmic reticulum protein 
tumor necrosis factor receptor 
superfamily member 1A 

Niemann-Pick disease type CI 
phytanoyl-CoA hydroxylase 
Refsum disease 



32063_at 
33355_at 



717_at 
39070 at 



LOC55884 


AF038187 


8.1655 


Above 


ZFPL1 


AF001891 


8.1384 




KIAA0397 


AB007857 


8.0041 


Above 


EP300 


U01877 


-7.7578 


Below 


CEP1 


AF083322 






MYPT2 


AB007972 


-7.7301 


Below 


PPP2R5D 


L76702 


-7.6161 


Below 


LY75 


AF011333 






RRM1 


X59543 


7.5778 


Above 


K1AA0692 


AB014592 


7.4662 


Above 


RANBP2 


D42063 


7.4114 


Above 


NOP56 


Y12065 


7.3622 


Above 


ERCC5 


L20046 


7.3597 


Above 


) 

ERPROT213 U94836 


7.3350 


Above 


-21 








TNFRSF1A 


M58286 


7.3039 


Above 


NPC1 


AF002020 


7.2357 


Above 


PHYH 


AF023462 


-7.2252 


Below 



Affymetrix 
number 



Table 17. Genes Selected by T statistics for E2A-PBX1 



Gene 
Symbol 



Reference 
number 



T-stat Above/ 
value Below 
Mean 



pre-B-cell leukemia transcription PBX1 
factor 1 

Homo sapiens cDNA FLJ12900 PBX1 
fis clone NT2RP2004321 (by 
CELERA search of target 
sequence = PBX1) 

FAT tumor suppressor Drosophila FAT 
homolog 

GS3955 protein GS3955 
singed Drosophila like sea urchin SNL 
fascin homolog like 



M86546 
AL049381 



D87119 
U03057 



33641_g_at nuclear factor of kappa light NFKBIL1 Y14768 

polypeptide gene enhancer in B- 

cells inhibitor-like 1 

36536_at schwannomm interacting protein 1 SCHIP-1 AF070614 

854_at B lymphoid tyrosine kinase BLK S76617 

37625_at interferon regulatory factor 4 IRF4 U52682 



126.7442 Above 
36.6116 Above 



30.7577 Above 

23.7813 Above 

-22.8956 Below 

-20.4637 Below 

-20.1554 Below 

19.6467 Above 

18.8419 Above 



-66- 



WO 03/083140 



PCT/LS03/08486 



10 


39614 at 


KIAA0802 protein 


KIAA0802 


AB018345 


17.8214 


Above 


11 


37099 at 


arachidonate 5-lipoxygenase- 


ALOX5AP 


AI806222 


-17.7944 


Below 






activating protein 










12 


38994_at 


STAT induced STAT inhibitor-2 


STATE 


AF037989 


-17.6553 


Below 


13 


37641_at 


Human gene for hepatitis C- 




D28915 


-17.3074 


Below 






associated microtubular aggregate 














protein p44, exon 9 and complete 










14 


40113_at 


cds. 

GS3955 protein 


GS3955 


D87119 


16.7288 


Above 


15 


203 l_s_at 


cyclin-dependent kinase unnbitor 


CDKN1A 


U03106 


-14.9826 


Below 






lj\ pzi v^ipi 










16 


330_s_at 


tubulin, alpha 1, isoform 44 


TUBAl 




-14.8016 


Below 








HT2348 






17 


38340_at 


huntingtin interacting protein- 1- 


K1AA0655 


AB014555 


14.7180 


Above 






related 










18 


38510_at 


Homo sapiens mRNA cDNA 




AL049435 


-14.4522 


Below 






DKFZp586B0220 










19 


268_at 


Homo sapiens platelet/endothelial 


PECAM 


L34657 


13 7540 


Below 






cell adhesion molecule-1 














(PECAM-1) gene, exon 16 and 














complete cds. 








Above 


20 


2062_at 


insulin-like growth factor binding 


IGFBP7 


L19182 


13.6403 






protein 7 










21 


37893_at 


protein tyrosine phosphatase non- 


PTPN2 


AI828880 


13.5099 


Above 






receptor type 2 










22 


38580_at 


guanine nucleotide binding protein GNAQ 


U43083 


-12.8525 


Below 






G protein q polypeptide 










23 


40049_at 


death-associated protein kinase 1 


DAPK1 


X76104 


-12.3837 


Below 


24 


38393 at 


KIAA0247 gene product 


KIAA0247 


D87434 


12.3436 


Above 


25 


39379_at 


Homo sapiens mRNA cDNA 




AL049397 


12.2102 


Above 






DKFZp586C1019 










26 


430_at 


nucleoside phosphorylase 


NP 


X00737 


12.1307 


Above 


27 


37975_at 


cytochrome b-245 beta 


CYBB 


X04011 


-12.0743 


Below 






polypeptide chronic 














granulomatous disease 








Above 


28 


34862_at 


CGI-49 protein 


LOC51097 


AA005018 


12.0264 


29 


39756_g_at 


X-box binding protein 1 


XBP1 


Z93930 


-11.9796 


Below 


30 


307_at 


arachidonate 5-lipoxygenase 


ALOX5 


JU-50UU 


-11.9492 


Below 


31 


37304_at 


chromobox homolog 1 Drosophila CBX1 


U35451 


11.9422 


Above 






HP1 beta 










32 


1287_at 


ADP-ribosyltransferase NAD poly ADPRT 


J03473 


11.9051 


Above 






ADP-ribose polymerase 










33 


1520_s_at 


interleukin 1 beta 


IL1B 


X04500 


11.7327 


Above 


34 


596_s_at 


colony stimulating factor 3 


CSF3R 


M59820 


-11.6814 


Below 






receptor granulocyte 










35 


37493_at 


colony stimulating factor 2 


CSF2RB 


H04668 


11.6620 


Above 






receptor beta low-affinity 














granulo cyte-macrophage 








Above 


36 


36452_at 


synaptopodin 


KIAA1029 


AB028952 


11.4021 


37 


1081_at 


ormthine decarboxylase 1 


ODC1 


M33764 


11.2865 


Above 



WO 03/083140 



PCT/LS03/08486 



1563_s_at tumor necrosis factor receptor TNFRSF1A M58286 -11.1361 Below 
superfamily member 1A 

39069_at AE-binding protein 1 AEBP1 AF053944 11.0984 Above 

36203_at ornithine decarboxylase 1 ODC1 X16277 10.9475 Above 



Table 18. Genes Selected by T statistics for Hyperdiploid 



Affymetrix 
number 


Gene Name 


Symbol 


Reference 
number 


T-stat 


Above/ 
Below 
Mean 


1 


36620_at 


superoxide dismutase 1 soluble 


SOD1 












amyotrophic lateral sclerosis 1 














adult 








Below 




39878 at 


protocadlierin 9 


PCDH9 


.ttJLJZ^lZj 


6 9008 


3 


37543_at 


Rac/Cdc42 guanine exchange 


ARHGEF6 


D25304 


6.8366 


Above 






factor GEF 6 










4 


41470_at 


i-ii 

prominin mouse like 1 


PROML1 


AF027208 


6.7290 


Above 


5 




muscle specific gene 


M9 


ABO 193 92 




Beow 


6 


38968_at 


SH3 -domain binding protein 5 


SH3BP5 


AB 005047 


64051 








BTK-associated 










7 


1915_s_at 


v-fos FBJ murine osteosarcoma 


FOS 












viral oncogene homolog 








Above 


8 


37677_at 


phosphoglycerate kinase 1 


PGK1 


V00572 


6.2865 


9 


39867_at 


Tu translation elongation factor 


TUFM 


S75463 


-6.2299 


Below 






mitochondrial 










10 


36795_at 


prosaposin variant Gaucher 


PSAP 


J03077 


6.1812 


Above 






disease and variant metachromatic 














leukodystrophy 








Below 


11 


40875_s_at 


small nuclear ribonucleoprotein 


SNRP70 


X06815 


-6.0877 






70kD polypeptide KNP antigen 










12 


306_s_at 


high-mobility group nonhistone 


HMG14 


J02621 


6.0804 


Above 






chromosomal protein 14 










13 


41724_at 


accessory proteins BAP31/BAP29 DXS1357E 


X81109 


6.0244 


Above 


14 


39168_at 


Ac-like transposable element 


ALTE 


AB018328 


5.9336 


Above 


15 


955_at 


calmodulin type I 


CALM1 


HG1862- 


5.8650 


Above 








HT1897 






16 


38604_at 


neuropeptide Y 


NPY 


AI198311 


5.8313 


Above 


17 


39147_g_at 


alpha thalassemia/mental 


ATRX 


U72936 


5.8181 


Above 






retardation syndrome X-linked 














RAD54 S. cerevisiae homolog 










18 


39069_at 


AE-binding protein 1 


AEBP1 


AF053944 


-5.6901 


Below 


19 


37014_at 


myxovims influenza resistance 1 


MX1 


M33882 


5.6688 


Above 






homolog of murine interferon- 














inducible protein p78 






5.6605 




20 


1520_s_at 


interleukin 1 beta 


IL1B 


X04500 


Above 



-68- 



WO 03/083140 



PCT/LS03/08486 



protein tyrosine phosphatase 
receptor type K 



22 32553_at 



26 1556_at 

27 40998_at 

28 37294_at 

29 1447_at 

30 35940_at 

31 33307_at 



PTPRK 
MAZ 



Human recombination acitivating 
protein (RAG2) gene, last exon 

RNA binding motif protein 5 
trinucleotide repeat containing 11 
THR-associated protein 230 kDa 
subunit 

B-cell translocation gene 1 anti- 
proliferative 

proteasome prosome macropain 
subunit beta type 1 

POU domain class 4 transcription 

factor 1 

kraken-like 



PFDN5 
RAG2 



L77886 
M94046 



MYC-associated zinc finge 
protein purine-binding 
transcription factor 

NADH dehydrogenase ubiquinone NDUFA1 N47307 
1 alpha subcomplex 1 7.5KD 
MWFE 
prefoldin 5 



D89667 
M94633 



RBM5 
TNRC11 

BTG1 
PSMB1 

POU4F1 
BK126B4.1 



U23946 
AF071309 

X61123 
D00761 

X64624 
AL022316 



-5.5877 Below 

-5.5000 Below 

5.4376 Above 

-5.4110 Below 

-5.4026 Below 

-5.3032 Below 

5.2349 Above 

-5.1877 Below 

5.1699 Above 

5.1200 Above 

-5.0984 Below 



1081_at 
34336_at 
41143_at 

3225 l_at 
35298_at 

38649_at 
36629_at 

39721_at 
2094_s_at 



ornithine decarboxylase 1 
lysyl-tRNA synthetase 
Human calmodulin (CALM1) 
gene, exons 2,3,4,5 and 6, and 
complete cds 

hypothetical protein FLJ21 174 
eukaryotic translation initiation 
factor 3 subunit 7 zeta 66/67kD 
KIAA0970 protein 
glucocorticoid-induced leucine 
zipper 

ephrin-Bl 

v-fos FBJ murine osteosarcoma 
viral oncogene homolog 



ODC1 M33764 
KARS D32053 
CALM1 U12022 



FLJ21 174 
EIF3S7 



AA149307 
U54558 



KIAA0970 AB023187 
GILZ AI635895 



EFNB1 
FOS 



U09303 
K00650 



-5.0822 Below 

-5.0692 Below 

5.0543 Above 

5.0373 Above 

-4.9499 Below 

-4.9228 Below 

4.8061 Above 

4.7968 Above 

4.7446 Above 



Table 19. Genes Selected by T statistics for MLL 



Affymetrix 
number 



Gene Reference 
Symbol number 



ALOX5 
MADH1 



J03600 
U59912 



X04500 
M93221 



T-stat Above/ 
value Below 
Mean 



307_at 
37280_at 



3 1520_s_at 



arachidonate 5-lipoxygenase 

MAD mothers against 

decapentaplegic Drosophila 

homolog 1 

interleukin 1 beta 

Human macrophage mannose 

receptor (MRC1) gene, exon 30. 



IL1B 
MRC1 



-16.8244 
-15.4460 



-13.6764 
-11.8629 



Below 
Below 



Below 
Below 



-69- 



WO 03/083140 



PCT/LS03/08486 



5 


33412_at 


LGALS1 Lectin, galactoside- 


LGALS1 


AI535946 


11.0223 


Above 






binding, soluble, 1 (galectin 1) 










6 


2062_at 


insulin-like growth factor binding 


IGFBP7 


L19182 


10.4318 


Above 






protein 7 








Below 


7 


35940_at 


POU domain class 4 transcription 
factor 1 


POU4F1 


X64624 


-10.1815 


8 


3972 l_at 


ephrin-Bl 


EFNB1 


U09303 


9 6158 


Below 


9 


39402_at 


interleukin 1 beta 


IL1B 


M15330 


-9.5998 


Below 


10 


1737_s_at 


insulin-like growth factor-binding 


IGFBP4 


M62403 


-9.4119 


Below 






protein 4 








Below 


11 


37413_at 


dipeptidase 1 renal 


DPEP1 




9 4101 


12 


40519_at 


protein tyrosine phosphatase 


PTPRC 


Y UUO^o 


93163 


AW 






receptor type C 








Below 


13 


1971_g_at 


fragile histidine triad gene 


FHIT 


U46922 


-9.2257 


14 


1983_at 


cyclin D2 


CCND2 


X68452 


-9.2213 




15 


38869_at 


KIAA1069 protein 


KIAA1069 






Bdow 


16 


40520_g at protein tyrosine phosphatase 


PTPRC 


Y00638 


9.1099 


Above 






receptor type C 








Above 


17 


1718_at 


actin related protein 2/3 complex 
subunit 2 34 kD 


ARPC2 


U50523 


9.0435 


18 


34237_at 


HBS1 S. cerevisiae like 


HBS1L 


AB028961 


-8.8208 


Below 


19 


1726_at 


DNA polymerase, epsilon, 
catalytic subunit 




HG919- 
HT919 


-8.4664 


Below 


20 


36643_at 


discoidin domain receptor family 
member 1 


DDR1 


L20817 


-8.4627 


Below 


21 


1325_at 


MAD mothers against 
decapentaplegic Drosophila 
homolog 1 


MADH1 


U59423 


-8.3762 


Below 


22 


39379_at 


Homo sapiens mRNA cDNA 
DKFZp586C1019 




AL049397 


8.2974 


Above 


23 


36536_at 


schwamiomin interacting protein 1 


SCHIP-1 


AF070614 


-8.1177 


Below 


24 


564_at 


guanine nucleotide binding proteii 
G protein alpha 1 1 Gq class 


i GNA11 


M69013 


-8.1107 


Below 


25 


39705_at 


KIAA0700 protein 


KIAA0700 


AB014600 


-7.9334 


Below 


26 


36105_at 


Human nonspecific crossreacting 
antigen mRNA, complete cds. 


NCA 


Ml 8728 


-7.6911 


Below 


27 


174_s_at 


intersectin 2 


ITSN2 


U61167 


7.5752 


Above 




39114_at 


decidual protein induced by 


DEPP 


AB022718 


-7.4767 


Below 






progesterone 








Above 


29 


40436_g_a- 


t solute carrier family 25 
mitochondrial carrier adenine 
nucleotide translocator member 6 


SLC25A6 


J03592 


7.3952 


30 


794_at 


protein tyrosine phosphatase non- 
receptor type 6 


PTPN6 


X62055 


7. 


Above 


31 


38032_at 


KIAA0736 gene product 


KIAA0736 


AB018279 


-7.0718 


Below 


32 


40518_at 


protein tyrosine phosphatase 


PTPRC 


Y00062 


6.9829 


Above 






receptor type C 








Below 


33 


41762_at 


TIA1 cytotoxic granule-associated TIAL1 


D64015 


-6.9118 






RNA-binding protein-like 1 











-70- 



WO 03/083140 



PCT/LS03/08486 



membrane metallo-endopeptidase MME 

neutral endopeptidase 

enkephalinase CALLA CD10 

leucine zipper down-regulated in LDOC1 



r 1 



36 188_at 

37 160033_s_s 



38 40913_at 

39 37398_at 



EFNB1 
XRCC1 



ephrin-Bl 

X-ray repair complementing 
defective repair in Chinese 
hamster cells 1 

ATPase Ca transporting plasma ATP2B4 
membrane 4 

platelet/endothelial cell adhesion PECAM 1 
molecule CD31 antigen 



protein tyrosine phosphatase 
receptor type K 



PTPRK 



U09303 
NM_006297 



W28589 
AA100961 



-6.7734 Below 



-6.7415 Below 



-6.5964 
-6.5936 



-6.5774 
-6.5675 



Below 
Below 



Below 
Below 



-6.5584 Below 



Table 20. Genes Selected by T statistics for Novel Risk Group 



Affymetrix 
number 



Gene 
Symbol 



Reference 
number 



T-stat 
value 



Above/ 
Below 
Mean 



41734_at 
31892_at 



34676_at 
37908_at 



KIAA0870 protein 
protein tyrosine phosphatase 
receptor type M 
protein tyrosine phosphatase 
receptor type M 
KIAA1 099 protein 
guanine nucleotide binding protein 
11 



KIAA0870 
PTPRM 



KIAA1099 
GNG11 



6 37960_at carbohydrate chondroitin 6/keratan CHST2 



7 33410_at 

8 40585_at 

9 33284_at 

10 41159_at 

11 3659 l_at 

12 37712 _g_at 

13 38576_at 

14 38408_at 



16 41273_at 

17 402_s_at 

18 35112_at 

19 34850_at 



integrin alpha 6 

adenylate cyclase 7 

myeloperoxidase 

clathrin heavy polypeptide He 

tubulin alpha 1 testis specific 

MADS box transcription enhancer 

factor 2 polypeptide C myocyte 

enhancer factor 2C 

H2B histone family member B 

transmembrane 4 superfamily 

member 2 

eukaryotic translation initiation 
factor 4 gamma 3 

FK506 binding protein 12- 
rapamycin associated protein 1 

intercellular adhesion molecule 3 
regulator of G-protein signalling 9 
ubiquitin-conjugating enzyme E2E 
3 homologous to yeast UBC4/5 



ITGA6 
ADCY7 
MPO 
CLTC 
TUBA1 
MEF2C 



TM4SF2 
EIF4G3 



ICAM3 
RGS9 
UBE2E3 



20 37030_at KIAA0887 protein 



AB020677 


-40.5168 


Below 


X58288 


33.4654 


Above 


X58288 


24.7557 


Above 


AB029022 


14.0491 


Above 


U31384 


11.4548 


Above 


AB014679 


10.9971 


Above 


S66213 


10.0370 


Above 


D25538 


-9.5897 


Below 


M19507 


-9.4724 


Below 


D21260 


9.4489 


Above 


X06956 


-9.1387 


Below 


S57212 


-9.1225 


Below 


AJ223353 


-9.0869 


Below 


L10373 


-8.7026 


Below 


AF012072 


-8.3540 


Below 


AL046940 


-8.3212 


Below 


X69819 


-7.9741 




AF071476 


7.8348 


Above 


ABO 17644 


7.8197 


Above 


AB020694 


-7.6343 


Below 



-71- 



WO 03/083140 



PCT/LS03/08486 



21 36322_at 

22 39509_at 

23 40091_at 

24 37280_at 

25 1325_at 

26 831_at 

27 37600_at 

28 41266_at 

29 36958_at 

30 36564_at 

31 32174_at 

32 619_s_at 

33 40749_at 



34 31894_at 

35 32319 at 



36 38259_at 

37 35629_at 



fucosyltransferase 7 alpha 1 3 
fiicosyltransferase 

Homo sapiens cDNAFLJ22071 
B-cell CLL/lymphoma 6 zinc 
finger protein 51 
MAD mothers against 
decapentaplegic Drosophila 
homolog 1 

MAD mothers against 
decapentaplegic Drosophila 
homolog 1 

DEAD/H Asp-Glu- Ala- Asp/His 
box polypeptide 10 RNA helicase 

extracellular matrix protein 1 

integrin alpha 6 

zyxin 

Human DNA sequence from clone 
RP5-1174N9 on chromosome 
lp34.1-35.3 
solute carrier family 9 
sodium/hydrogen exchanger 
isoform 3 regulatory factor 1 
membrane-spanning 4-domains 
subfamily A member 2 Fc 
fragment of IgE high affinity I 
receptor for beta polypeptide 
membrane-spanning 4-domains 
subfamily A member 2 Fc 
fragment of IgE high affinity I 
receptor for beta polypeptide 
centromere protein C 1 
tumor necrosis factor ligand 
superfamily member 4 tax- 
trans criptionally activated 
glycoprotein 1 34kD 
syntaxin binding protein 2 
hypothetical protein 



BCL6 
MADH1 



ECM1 
ITGA6 
ZYX 



AI692348 
U00115 



U68186 
X53586 
X95735 
W27419 



SLC9A3R1 AF0 15926 



CENPC1 
TNFSF4 



STXBP2 
DJ1042K10. 



M95724 
AL022310 



AB002559 
AL022238 



-7.6232 Below 

-7.6171 Below 

7.5991 Above 

7.5824 Above 

7.4276 Above 

-7.2991 Below 

7.2985 Above 

-7.2889 Below 

-7.2848 Below 

-7.2749 Below 

-7.2325 Below 

-7.2063 Below 



6.9679 Above 
6.8225 Above 



-6.6992 Below 
-6.6968 Below 



38 38700_at 

39 37397_at 



40 41127_at 



cysteine and glycine-rich protein 1 CSRP 1 
Homo sapiens platelet/endothelial PECAM 
cell adhesion molecule-1 
(PECAM- 1) gene, exon 16 and 
complete cds. 

solute carrier family 1 SLC1A4 
glutamate/neutral amino acid 



M33146 
L34657 



-6.6962 Below 
-6.6934 Below 



-6.6892 Below 



-72- 



WO 03/083140 



PCT/LS03/08486 







Table 21. Genes Selected by T statistics 


for T-ALL 








Affymetrix 


Gene Name 


Gene 


Reference 


T-stat 


Above/ 




number 




Symbol 


number 














Mean 






B cell linker protein 


SLP65 


AF068180 


-115.8362 


Below 


2 


38319_at 


CD3D antigen delta polypeptide 
TiT3 complex 


CD3D 


AA919102 


27.6995 


Above 


3 


37988_at 


CD79B antigen immunoglobulin- 
associated beta 


CD79B 


M89957 


-23.7294 


Below 


4 


38147_at 


SH2 domain protein 1 A Duncan s 
disease lymphoproliferative 


SH2D1A 


AL023657 


22.4501 


Above 


5 


38522 s at 


syndrome 
CD22 antigen 


CD22 


X52785 


-21.2795 


Below 


6 


35350_at 


B cell RAG associated protein 


BRAG 


AB011170 


-19.1460 


Below 


7 


36277_at 


Human membran protein (CD3- 
epsilon) gene, exon 9. 


CD3E 


M23323 


19.0859 


Above 


8 


38604_at 


neuropeptide Y 


NPY 


AI198311 


-18.8194 


Below 


9 


33705_at 


phosphodiesterase 4B cAMP- 
specific dunce Drosophila 


PDE4B 


L20971 


-18.6383 


Below 






homolog phosphodiesterase E4 






-18.5620 


Below 


10 


36878_f_at 


major histocompatibility complex 
class II DQ beta 1 


HLA-DQB1 


M60028 


11 


36638_at 


connective tissue growth factor 


CTGF 


X78947 


-18.2772 


Below 


12 


32794_g_at 


T cell receptor beta locus 


TRB 


X00437 


17.9081 


Above 


13 


32174_at 


solute carrier family 9 
sodium/hydrogen exchanger 


SLC9A3R1 


AF015926 


17.4427 


Above 






isoform 3 regulatory factor 1 






-17.3412 


Below 


14 


160041_at 


protein tyrosine phosphatase non- 
receptor type 18 brain-derived 


PTPN18 


X79568 


15 


38521_at 


CD22 antigen 


CD22 


X59350 


-17.0388 


Below 


16 


38018_g_at 


CD79A antigen immiuioglobulin- 
associated alpha 


CD79A 


U05259 


-16.7948 


Below 


17 


36571 at 


topoisomerase DNA II beta 180kD TOP2B 


X68060 


-16.7508 


Below 


18 


1096_g_at 


CD 19 antigen 


CD 19 


M28170 


-16.4583 


Below 


19 


39318_at 


T-cell leukemia/lymphoma 1A 


TCL1A 


X82240 


-16.2017 


Below 


20 


41710_at 


hypothetical protein 


LOC54103 


AL079277 


-15.9099 


Below 






H2.0 Drosophila like homeo box 1 HLX1 


M60721 


-15.5425 


Below 


22 


266_s_at 


CD24 antigen small cell lung 
carcinoma cluster 4 antigen 


CD24 


L33930 


-15.0123 


Below 


23 


36502_at 


PFTAIRE protein kinase 1 


PFTK1 


AB020641 


-14.9972 


Below 


24 


39114_at 


decidual protein induced by 


DEPP 


AB022718 


-14.9886 


Below 


25 


37539_at 


progesterone 

RalGDS-like gene KIAA0959 


KIAA0959 


AB023176 


-14.6872 


Below 


26 


40775_at 


protein 

integral membrane protein 2A 


ITM2A 


AL021786 


14.5666 


Above 


27 


34033_s_at 


leukocyte immunoglobulin-like 
receptor subfamily A with TM 
domain member 2 


LILRA2 


AF025531 


-14.3809 


Below 
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28 2031_s_at cyclin-dependent kinase inhibitor 
!Ap21Cipl 



29 3805 l_at 

30 35794_at 

31 41156_g_at 

32 32979_at 

33 32562_at 

34 36536_at 

35 36108_at 

36 41734_at 

37 41153_f_at 

38 37710_at 

39 39893_at 

40 37908_at 



mal T-cell differentiation protein 
KIAA0942 protein 
catenin cadherin-associated 
protein alpha 1 102kD 

GRB2-associated binding protein 
1 

endoglin Osler-Rendu- Weber 
syndrome 1 

schwannomin interacting protein 1 
major histocompatibility complex 
class II DQ beta 1 

KIAA0870 protein 
Homo sapiens alphaE-catenin 
(CTNNA1) gene, exon 18 and 
complete cds. 

MADS box transcription enhancer 
factor 2 polypeptide C myocyte 
enhancer factor 2C 
guanine nucleotide binding protein 
G protein gamma 7 

guanine nucleotide binding protein 



MAL 

KIAA0942 
CTNNA1 

GAB1 

ENG 

SCffiP-1 
HLA-DQB1 

KIAA0870 
CTNNA1 

MEF2C 

GNG7 

GNG11 



X76220 

AB023159 

U03100 

U43885 

X72012 

AF070614 
M16276 

AB020677 
AF1 02803 



AB010414 
U31384 



-14.1071 Below 



14,0743 Above 
-13.9659 Below 



-13.5842 Below 

-13.4209 Below 

-13.4172 Below 

-13.3518 Below 

-13.2672 Below 

-12.7927 Below 

-12.7716 Below 

-12.7696 Below 

-12.7353 Below 



Table 22. Genes Selected by T statistics for TEL-AML1 



Affymetrix 
number 



Gene 
Symbol 



Reference 
number 



T-stat Above/ 
value Below 
Mean 



1 38578_at 
' 2 38203_at 



37780_at 
35614_at 



160029_at 
1980_s_at 



9 34194_at 

10 37908_at 



tumor necrosis factor receptor 
superfamily member 7 

potassium intermediate/small 
conductance calcium-activated 
channel subfamily N m 



Rho guanine nucleotide exchange ARHGEF4 
factor GEF 4 



M63928 
U69883 



piccolo presynaptic cytomatrix PCLO 
protein 

transcription factor-like 5 basic TCFL5 
helix-loop-helix 

protein kinase C beta 1 PRKCB 1 

non-metastatic cells 2 protein NME2 
NM23B expressed in 

protein tyrosine phosphatase PTPRK 
receptor type K 

Homo sapiens cDNAFLJ21697 

guanine nucleotide binding protein GNG1 1 

11 

CRMP1 



AB011131 
AB012124 



X07109 
X58965 



AL049313 
U31384 



1 1 40272_at collapsin response mediator 



15.2209 Above 



15.0804 Above 



14.9774 Above 

14.1405 Above 

12.9369 Above 

12.5429 Above 

-12.5035 Below 

12.3871 Above 

12.1089 Above 

11.4322 Above 

11.0625 Above 
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12 


41097_at 


telomeric repeat binding factor 2 


TERF2 


AF002999 


11.0133 


Above 


13 


33690_at 


Homo sapiens mRNA cDNA 




AL080190 


10.8763 


Above 






DKFZp434A202 










14 


32730_at 


Homo sapiens mRNA for 




AL080059 


10.7439 


Above 






KIAA1750 










15 


1325_at 


MAD mothers against 


MADH1 


U59423 


10.5332 


Above 






decapentaplegic Drosophila 














homolog 1 










16 


41819_at 


FYN-binding protein FYB- 


FYB 


U93049 


10.3692 


Above 






120/130 










17 


- 


telomeric repeat binding factor 2 




2 


10 2921 


Ab 

ove 


18 


35665_at 


phosphoinositide-3-kinase class 3 


PIK3C3 


Z46973 


10.0568 


Above 


19 


36537_at 


Rho-specific guanine nucleotide 


P114-RHO- 


ABO 11 093 


9.8824 


Above 






exchange factor pi 14 


GEF 








20 


37280_at 


MAD mothers against 


MADH1 


U59912 


9.8662 


Above 






decapentaplegic Drosophila 
























21 


1936_s_at 


proto-oncogene c-myc, alt. 




HG3523- 


-9.6621 


Below 






transcript 3, ORF 114 




HT4899 






22 


1077_at 


recombination activating gene 1 


RAG1 


M29474 


9.4563 


Above 


23 


38763_at 


Human (clone D21-1) L-iditol-2 




L29254 


-9.2719 


Below 






dehydrogenase gene, exon 9 and 














complete cds. 










24 


41295_at 


GTT1 protein 


GTT1 


AL041780 


-9.1813 


Below 


25 


36008_at 


protein tyrosine phosphatase type 


PTP4A3 


AF041434 


9.1682 


Above 






IVA member 3 










26 


38570_at 


major histocompatibility complex 


HLA-DOB 


X03066 


9.0394 


Above 






class II DO beta 












32163__f_at 


EST 




A ATI ££30 




Above 


28 


40570_at 


forkheadbox OlA 


FOXOIA 


AF032885 


8.9931 


Above 






rhabdomyosarcoma 










29 


32724 at 


phytanoyl-CoA hydroxylase 


PHYH 


A THY} "2 AC.1 


8.9571 


Ab v 






Refsum disease 










30 


932_i_at 


zinc finger protein 91 HPF7 


ZNF91 


L11672 


8.8075 


Above 






HTF10 










31 


37343_at 


inositol 1 4 5-triphosphate receptor ITPR3 


U01062 


8.7321 


Above 






type 3 










32 


33447_at 


myosin light polypeptide 


MLCB 


X54304 


-8.6848 


Below 






regulatory non-sarcomeric 20kD 










33 


35362_at 


myosin X 


MYO10 


ABO 18342 


8.6700 


Above 


34 


38906_at 


spectrin alpha erythrocytic 1 


SPTA1 


M61877 


8.5010 


Above 






elliptocytosis 2 










35 


324_f_at 


basic transcription factor 3 


BTF3 


HG1515- 


-8.4705 


Below 










HT1515 






36 


39329_at 


actinin alpha 1 


ACTN1 


X15804 


-8.3219 


Below 


37 


577_at 


midkine neurite growth-promoting 


MDK 


M94250 


8.2693 


Above 






factor 2 










38 


40729_s_at 


nuclear factor of kappa light 


NFKBIL1 


Y14768 


8.2000 


Above 



polypeptide gene enhancer in B- 
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39 41442_at core-binding factor runt domain CBFA2T3 AB010419 8.0604 Above 

alpha subunit 2 translocated to 3 

40 36275_at Homo sapiens mRNA from AB002438 7.8550 Above 

chromosome 5q21-22 clone 
FBR89 

4. Wilkins' 

This method of selecting genes uses the weighted sum of three components to 
5 estimate the discriminative value of each gene. The higher the score, the better the 
gene is at discriminating between the two classes. The input to the scoring method is 
preprocessed and normalized data. The idea of the metric is that a gene is a good 
discriminator if: (1) it is expressed in one class and not in the other, or if the gene is 
expressed in both classes, but significantly more so in one than the other, or (2) the 

1 0 gene is present in most samples, and the data are pure, in the sense that there is a 
threshold expression value for the gene where the gene generally has expression 
levels larger than the threshold in one class, and smaller than the threshold in the other 
class. The components of the metric were quantified as follows. For a gene, assume 
PRi is the ratio of "present" samples to all samples in class 1, where present means 

1 5 that the gene's expression value was not preprocessed to a constant (1). Assume PR2 
is defined similarly for class 2. The first component of the metric, Mi, is estimated as 
the absolute difference between PRi and PR 2 . This value is between 0 (when the gene 
is equally present in both classes) and 1 (when the gene is expressed in one class and 
not in the other). The second component of the metric, M 2 , measures the extent to 

20 which the gene is present overall, and is defined as the average of PRi and PR 2 . The 
final component, M 3 , estimates the "purity", or existence of a threshold value. The 
gene expression values for the present samples are sorted into ascending order and a 
vector of their class labels is built, for example {+, +, +, -, -, -, +, -, -, +, -}. The next 
step is to find the best place to partition the samples so that the expression values for 

25 one class (maybe +) are less than the partition point, and the values from the other 
class are larger. Let Lei and La be the number of class 1 and class 2 samples on the 
left side of the partition, respectively. Assume Ro and R C2 are defined similarly for 
the right side of the partition. Then the purity is estimated as: max {La - Lc 2 + Rc2 - 
Rci, L C2 - La + Ra - Rc2> / number of total present samples. Each possible partition 

30 is checked. In the example above, the partition {+, +, +, || -, -, -, +, -, -, +, -} is the best 
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partition, with a purity value of M 3 = 7 / 11 = 0.64. The score for the gene is the 
weighted sum of 0.5*Mi + 0.25*M 2 + 0.25*M 3 . The top 50 genes for each subgroup 
selected by this metric are listed in Tables 23-29. For class prediction all 50 genes 
were used, unless otherwise stated. 



Table 23. Genes Selected by Wilkins' for BCR-ABL 



Affymetrix 
number 


Gene Name 


Gene 
Symbol 


Reference 
number 


Train set 


Above/ 
Below 
Mean 


1 32319_at 


tumor necrosis factor ligand 


TNFSF4 


AL022310 


0.6354 






superfamily member 4 tax- 












trans criptionally activated 












glycoprotein 1 34kD 








Below 


2 37479_at 


CD72 antigen 


CD72 


M54992 


0.6352 


3 1211_s_at 


CASP2 and RIPK1 domain 


CRADD 


U84388 


0.6265 


Above 




containing adaptor with death 












domain 






0.6161 


Above 


4 37397_at 


plateler/endothelial cell adhesion 


PECAM 


L34657 




molecule-1 (PECAM-1) gene 








Below 


5 33162_at 


insulin receptor 


INSR 


X02160 


0.6118 


6 39691_at 


SH3-contammg protein SH3GI B 1 


SH3GLB1 


AB007960 


0.6089 


Above 


7 1558_g_at 


p21/Cdc42/Racl-activated kinase 1 


PAK1 


U24152 


0.6087 


ove 




yeast Ste20-related 








Above 


8 34759_at 


Human hbc647 mRNA sequence 




U68494 


0.6061 


9 33774_at 


caspase 8 apoptosis-related cysteine 


CASP8 


X98172 


0.6040 


Above 


10 1326_at 


protease 

caspase 10 apoptosis-related 


CASP10 


U60519 


0.6021 


Above 




cysteine protease 










11 38312_at 


DKFZp5640222 from clone 




AL050002 


0.6010 


Above 




DKFZp5640222 






0.5989 


Above 


12 35970_g_at 


M-phase phosphoprotein 9 


MPHOSPH9 N23137 


13 41273_at 


FK506 binding protein 12- 


FRAP1 


AL046940 


0.5989 


Above 




rapamycin associated protein 1 












a disintegrin and metalloproteinase 


ADAM 10 


Z48579 


0.5980 


Above 




domain 10 










15 40953_at 


calponin 3 acidic 


CNN3 


S80562 


0.5972 


Above 


16 1434_at 


phosphatase and tensin homolog 


PTEN 


U92436 


0.5963 


Below 




mutated in multiple advanced 










17 38966_at 


cancers 1 

glycoprotein synaptic 2 


GPSN2 


AF038958 


0.5953 


Above 


18 35991_at 


Sm protein F 


LSM6 


AA9 17945 


0.5938 


Above 


19 330_s_at 


tubulin, alpha 1, isoform 44 


TUBA1 


HG2259- 


0.5938 


Above 








HT2348 






20 38032_at 


KIAA0736 gene product 


KIAA0736 


ABO 18279 


0.5934 


Above 


21 1983_at 


cyclin D2 


CCND2 


X68452 


0.5927 


Above 


22 36194_at 


low density lipoprotein-related 


LRPAP1 


M63959 


0.5914 


Below 




protein-associated protein 1 alpha- 












2-macroglobulin receptor- 












associated protein 1 
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23 34460_at 


peripheral benzodiazepine receptor- 


PRAX-1 


AB014512 


0.5911 


Above 




associated protein 1 










24 2001_g_at 


ataxia telangiectasia mutated 


ATM 


U26455 


0.5910 


Above 


includes complementation groups A 












C and D 










25 31443_at 


AML1 






0 S8Q6 


Abo e 
ove 


26 33410_at 


integrin alpha 6 


ITGA6 


S66213 


0.5896 


Above 


27 37472_at 


mannosidase beta A lysosomal 


MANBA 


U60337 


0.5887 


Below 


28 36099_at 


splicing factor arginine/serine-rich 


SFRS1 


M69040 


0.5877 


Below 




1 splicing factor 2 alternate splicing 












factor 










29 38636_at 


immunoglobulin superfamily 


ISLR 


AB003184 


0.5858 


Above 




containing leucine-rich repeat 










30 34314_at 


ribonucleotide reductase Ml 


RRM1 


X59543 


0.5858 


Below 




polypeptide 








Above 


31 36129_at 


KIAA0397 gene product 


KIAA0397 


AB007857 


0.5858 


32 40264_g_at 


zinc finger protein-like 1 


ZFPL1 


AF001891 


0.5858 


Above 


33 37399_at 


aldo-keto reductase family 1 


AKR1C3 


D17793 


0.5852 


Above 
















dehydrogenase type II 










34 38160_at 


lymphocyte antigen 75 


LY75 


AF011333 


0.5832 


Above 


35 41649_at 


FOXJ2 forkhead factor 


LOC55810 


AF038177 


0.5832 


Above 


36 36591_at 


tubulin alpha 1 testis specific 


TUBA1 


X06956 


0.5832 


Above 


37 40167_s_at 


CS box-containing WD protein 


LOC55884 


AF038187 


0.5832 


Above 


38 2064_g_at 


excision repair cross- 


ERCC5 


L20046 


0.5832 


Above 




complementing rodent repair 












deficiency complementation group 










39 39729_at 


Human natural killer cell enhancing 


NKEFB 


L19185 


0.5829 


Below 




factor (NKEFB) mRNA, complete 
cds. 










40 38270_at 


poly ADP-ribose glycohydrolase 


PARG 


AF005043 


0.5828 


Below 


41 40613 at 


uncharactenzed hypothalamus 




at mms 




Belo 
eow 




protemHT012 










jy\j /v_ai 


singe rosop a e sea urc 


SNL 


TJ03057 


0.5813 


Above 




ascin omo og l e 










43 40782_at 


short-chain 


SDR1 


AF061741 




Ab 

ove 




dehydrogenase/reductase 1 










44 34256 at 




SIAT9 


AB018356 


0.5797 


Above 




lactosylceramide alpha-2 3- 












sialyltransferase GM3 syntliase 










45 41836_at 


protein with polyglutamine repeat 


ERPROT213 U94836 


0.5777 


Above 




calcium ca2 homeostasis 


-21 










endoplasmic reticulum protein 










46 35681_r_at 


zinc finger homeobox IB 


ZFHX1B 


AB011141 


0.5759 


Below 


47 37190_at 


WAS protein family member 1 


WASF1 


D87459 


0.5759 


Below 


48 32788_at 


RAN binding protein 2 


RANBP2 


D42063 


0.5756 


Above 


49 828_at 


prostaglandin E receptor 2 subtype 


PTGER2 


U19487 


0.5740 


Above 




EP2 53kD 










50 38220_at 


dihydropyrimidine dehydrogenase 


DP YD 


U20938 


0.5737 


Above 
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Table 24: Genes Selected by Wilkins' for E2A-PBX1 






Affymetrh 


i Gene Name 


Gene 


Reference 


Train set 


Above/ 


number 




Symbol 


number 


score 


Below 














1 32063_at 


pre-B-cell leukemia transcription 
factor 1 


PBX1 


M86546 


0.8750 


Above 


2 38994_at 


STAT induced STAT inhibitor-2 


STATE 


AF037989 


0.8252 


Below 


3 33355_at 


Homo sapiens cDNA FLJ12900 fis 


PBX1 


AL049381 


0.8040 


Above 




clone NT2RP2004321 (by 












CELERA serach of target sequence 










= PBX1) 










4 40454_at 


FAT tumor suppressor Drosophila 


FAT 


X87241 


0.7899 


Above 




homolog 










5 753_at 


nidogen 2 


NID2 


D86425 


0.7368 


Above 


6 717_at 


GS3955 protein 


GS3955 


D87119 


0.7306 


Above 


7 1786_at 


c-mer proto-oncogene tyrosine 
kinase 


MERTK 


U08023 


0.7300 


Above 


8 39070_at 


singed Drosophila like sea urchin 


SNL 


U03057 


0.7271 


Below 




fascin homolog like 










9 1065_at 


fins-related tyrosine kinase 3 


FLT3 


U02687 


0.7160 


Below 


10 36650_at 


cyclin D2 


CCND2 


D13639 


0.7151 


Below 


11 33513_at 


signaling lymphocytic activation 


SLAM 


U33017 


0.7096 


Above 




molecule 










12 33748_at 


minor histocompatibility antigen 


KIAA0223 


D86976 


0.7084 


Below 




HA-1 










13 37225_at 


KIAA0172 protein 


KIAA0172 


D79994 


0.7033 


Above 


14 38717_at 


DKFZP586A0522 protein 


DKFZP586A AL050159 


0.7003 


Below 






0522 








15 854_at 


B lymphoid tyrosine kinase 


BLK 


S76617 


0 6982 


Abo e 


16 33641_g_at 


nuclear factor of kappa light 


NFKBIL1 


Y14768 


069 


B 1° V£ 
eow 




polypeptide gene enhancer in B- 












cells inhibitor-like 1 










17 40468_at 


KIAA0554 protein 


KIAA0554 


AB011126 


0.6971 


Below 


18 41266_at 


integrin alpha 6 


ITGA6 


X53586 


0.6965 


Below 


19 36536_at 


schwaunomin interacting protein 1 


SCHIP-1 


AF070614 


0.6938 


Below 


20 362_at 


protein kinase C zeta 


PRKCZ 


Z15108 


0.6904 


Above 


21 755_at 


inositol 1 4 5-triphosphate receptor ITPR1 


D26070 


0.6877 


Below 




typel 










22 307_at 


arachidonate 5-lipoxygenase 


ALOX5 


J03600 


0.6875 


Below 


23 39614_at 


KIAA0802 protein 


KIAA0802 


AB018345 


0.6863 


Above 


24 1563_s_at 


tumor necrosis factor receptor 


TNFRSF1A 


M58286 


0.6837 


Below 




superfamily member 1 A 










25 38748_at 


adenosine deaminase RNA-specific 


AD ARB 1 


U76421 


0.6763 


Above 




Bl homolog of rat RED 1 










26 41409_at 


basement membrane-induced gene 


ICB-1 


AF044896 


0.6757 


Below 


27 34892_at 


tumor necrosis factor receptor 


TNFRSF10B 


AF016266 


0.6726 


Below 




superfamily member 10b 










28 40648_at 


c-mer proto-oncogene tyrosine 
kinase 


MERTK 


U08023 


0.6710 


Above 


29 38408_at 


transmembrane 4 superfamily 
member 2 


TM4SF2 


L10373 


0.6667 


Below 



-79- 



WO 03/083140 



PCT/LS03/08486 



30 34583 at 


fins related tyrosine kinase 3 


FLT3 


U02687 


0.6665 


Below 


31 36900_at 


stromal interaction molecule 1 




U52426 


0.6650 




da j /ozj at 




IHF4 11 


U52682 


0.6636 


Above 


33 38340 at 


related 


KIAA0655 


AB014555 


0.6609 


Above 


34 1830_s_at 


transforming growth factor beta 1 


TGFB1 


M38449 


0.6608 


Below 


35 37099_at 


arachidonate 5-lipoxygenase- 


ALOX5AP 


AI806222 


0.6605 


Below 




activating protein 










36 38254_at 


KIAA0882 protein 


KIAA0882 


AB020689 


0.6539 


Below 


37 37641_at 


Human gene for hepatitis C- 




D28915 


0.6531 


Below 




associated microtubular aggregate 












protein p44, exon 9 and complete 
cds. 

adenovirus 5 El A binding protein 










38 33865_at 


BS69 


AA127624 


0.6515 


Below 


39 40729_s_at 


nuclear factor of kappa light 
polypeptide gene enhancer in B- 
cells inhibitor-like 1 


NFKBIL1 


Y14768 


0.6502 


Below 


40 40113_at 


GS3955 protein 


GS3955 


D87119 


0.6476 


Above 


41 32979_at 


GRB2-associated binding protein 1 


GAB1 


U43885 


0.6457 


Below 


42 3659 l_at 


tubulin alpha 1 testis specific 


TUBA1 


X06956 


0.6427 


Below 


43 38739_at 


v-ets avian erythroblastosis virus 


ETS2 


AF017257 


0.6424 


Below 




E26 oncogene homolog 2 










44 37485_at 


fotty-acid-Coenzyme A ligase very 
long-chain 1 


FACVL1 


D88308 


0.6363 


Above 


45 538_at 


CD34 antigen 


CD34 


S53911 


0.6326 


Below 


46 37893 at 


protein tyrosine phosphatase non- 
receptor type 2 


PTPN2 


AI828880 


0.6318 


Above 


47 41017_at 


myosin-binding protein H 


MYBPH 


U27266 


0.6297 


Above 


48 37967_at 


lymphocyte antigen 117 


LY117 


AF000424 


0.6260 


Below 


49 3728 l_at 


KIAA0233 gene product 


KIAA0233 


D87071 


0.6250 


Below 


50 35675_at 


vinexin beta SH3-containing 


SCAM-1 


AF037261 


0.6229 


Below 




adaptor molecule- 1 











Table 25. Genes selected for Wilkins for Hyperdiploid > 50 





Affymetrix 


Gene Name 


Gene 


Reference 


Train set 


Above/ 




number 




Symbol 


number 


score 


Below 
Mean 


1 


39878_at 


protocadherin 9 


PCDH9 


AI524125 


0.5838 


Below 


2 


41470_at 


Prominin mouse like 1 


PROML1 


AF027208 


0.5616 


Above 


3 


39069_at 


AE-binding protein 1 


AEBP1 


AF053944 


0.5423 


Below 


4 


1520_s_at 


interleukin 1 beta 


IL1B 


X04500 


0.5399 


Above 


5 


578_at 


Human recombination acitivating 
protein (RAG2) gene, last exon 


RAG2 


M94633 


0.5208 


Below 


6 


3225 l_at 


hypothetical protein FLJ21 174 


FLJ21174 


AA149307 


0.5164 


Above 


7 


40480_s_at 


FYN oncogene related to SRC FGR FYN 


M14333 


0.5090 


Above 






YES 










8 


38604_at 


neuropeptide Y 


NPY 


AI198311 


0.5083 


Above 
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9 40903_at 

10 38968_at 

11 37272_at 

12 35688_g_at 

13 1488__at 

14 36885_at 

15 1630_s_at 

16 38317_at 

17 38649_at 

18 39721_at 

19 33307_at 

20 38518_at 

21 39402_at 

22 36489_at 



24 40200_at 

25 35940_at 

26 35727_at 

27 1357_at 

28 36592_at 

29 37014_at 



ATPase H transporting lysosomal APT6M8-9 
vacuolar proton pump membrane 
sector associated protein M8-9 
SH3 -domain binding protein 5 
BTK-associated 
inositol 1 4 5-trisphosphate 3- 
kinaseB 

mature T-cell proliferation 1 
protein tyrosine phosphatase 
receptor type K 
spleen tyrosine kinase 
tyrosine kinase syk 

transcription elongation factor A 
SII like 1 

KIAA0970 protein 
ephrin-Bl 
kraken-like 

sex comb on midleg Drosophila like 
2 

interleukin 1 beta 
phosphoribosyl pyrophosphate 

n V (ANX5) gene, 

exon 13. 

heat shock transcription factor 1 
POU domain class 4 transcription 
factor 1 

hypothetical protein FLJ205 17 
ubiquitin specific protease 4 proto- 



SH3BP5 

JTPKB 

MTCP1 
PTPRK 

SYK 
syk 

TCEAL1 

BCIAA0970 
EFNB1 
BK126B4.1 
SCML2 

JL1B 
PRPS1 

(ANX5 



AB005047 
X57206 

Z24459 
L77886 

L28824 
HG3730- 
HT4000 
M99701 

AB023187 
U09303 
AL022316 
Y 18004 

M15330 
D00860 

U05770 

M64673 



PCT/LS03/08486 

0.5080 Above 

0.5057 Above 

0.5025 Below 

0.5018 Above 

0.4977 Below 

0.4964 Below 

0.4913 Below 

0.4901 Above 

0.4898 Below 

0.4895 Above 

0.4880 Below 

0.4879 Above 

0.4750 Above 

0.4718 Above 

0.4717 Above 

0.4689 Below 



31 40846^g_at 

32 41132_r_at 

33 37280_at 

34 35939_s_at 

35 890_at 

36 38738_at 

37 38458_at 



FTJ20517 
USP4 



proliibitin PHB 
myxovirus influenza resistance 1 MX1 
homolog of murine interferon- 
inducible protein p78 

DNA segment on chromosome X DXS9879E 
unique 9879 expressed sequence 

interleukin enhancer binding factor ILF3 
3 90Kd 

heterogeneous nuclear 
ribonucleoprotein H2 H 



HNRPH2 
MADH1 



MAD mothers against 
decapentaplegic Drosophila 
homolog 1 

POU domain class 4 transcription POU4F1 
factor 1 

ubiquitin-conjugating enzyme E2A UBE2A 
RAD 6 homolog 

SMT3 suppressor of mif two 3 SMT3H1 
yeast homolog 1 

Human cytochrome b5 (CYB5) CYB5 
gene, exon 6 and complete cds. 



X64624 


0.4685 


Above 


AI249721 
U20657 


0.4675 
0.4670 


Below 
Below 


S85655 
M33882 


0.4668 
0.4635 


Above 
Above 


X92896 


0.4608 


Above 


U10324 


0.4605 


Below 


U01923 


0.4605 


Above 


U59912 


0.4595 


Below 


L20433 


0.4594 


Above 


M74524 


0.4570 


Above 


X99584 


0.4568 


Above 


L39945 


0.4552 


Above 
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38 38869_at 


KIAA1 069 protein 


KIAA1069 








39 915_at 


interferon-induced protein with 


IFIT1 




04544 


Above 




tetratricopeptide repeats 1 










40 38408_at 


transmembrane 4 superfamily 


TM4SF2 


L10373 


0.4535 


Above 




member 2 








Below 


41 39301_at 


calpain 3 p94 


CAPN3 


X85030 


0.4533 


42 41425_at 


Friend leukemia virus integration 1 


FLU 


M98833 


0.4519 


Below 


43 2094 s at 


v-fos FBJ murine osteosarcoma 


FOS 


K00650 


0.4514 


Above 




viral oncogene homolog 










44 36605_at 


transcription factor 4 


TCF4 


M74719 


0.4497 


Above 


45 37709_at 


DNA segment numerous copies 


DXF68S1E 


M86934 


0.4493 


Above 




expressed probes GS1 gene 








Above 


46 36128_at 


transmembrane trafficking protein 


TMP21 


L40397 


0.4488 


47 171_at 


von Hippel-Lindau binding protein 


VBP1 


U56833 


0.4473 


Above 


48 41490_at 


1 

phosphoribosyl pyrophosphate 


PRPS2 


Y00971 


0.4466 


Above 




synthetase 2 








Above 


49 36536__at 


schwannomin interacting protein 1 


SCHIP-1 


AF070614 


0.4448 


50 35843_at 


Homo sapiens mRNA cDNA 




L40402 


0.4443 


Above 




DKFZp434D0935 













Table 26. Genes Selected by Wilkins' for MLL 






Affymetrix 


Gene Name 


Gene 


Reference 


Train set 


Above/ 


number 




Symbol 


number 


score 


Below 










Mean 


1 39402_at 


interleukm 1 beta 


IL1B 


M15330 


0.7355 


Below 


2 307_at 


arachidonate 5-lipoxygenase 


ALOX5 


J03600 


0.7221 


Below 


3 1389_at 


membrane metallo-endopeptidase 


MME 


J03779 


0.7178 


Below 




neutral endopeptidase 












enkephalinase CALLA CD 10 








Below 


4 37280_at 


MAD mothers against 


MADH1 


U59912 


0.7021 




decapentaplegic Drosophila 












homolog 1 








Below 


5 36650_at 


cyclin D2 


CCND2 


D 13639 


0.6759 


6 37043_at 


inhibitor of DNA binding 3 


ED3 


AL021154 


0.6743 


Below 




dominant negative helix-loop-helix 










7 1520_s_at 


protein 

interleukin 1 beta 


IL1B 


X04500 


0.6689 


Below 


8 40913_at 


ATPase Ca transporting plasma 


ATP2B4 


W28589 


0.6684 


Below 


9 36536_at 


membrane 4 

schwannomin interacting protein 1 


SCHIP-1 


AF070614 


0.6554 


Below 


10 37398_at 


platelet/endothelial cell adhesion 


PECAM1 


AA100961 


0.6548 


Below 




molecule CD31 antigen 








Below 


11 39114_at 


decidual protein induced by 


DEPP 


AB022718 


0.6478 


12 37967_at 


progesterone 
lymphocyte antigen 117 


LY117 


AF000424 


0.6432 


Below 


13 1325_at 


MAD mothers against 


MADH1 


U59423 


0.6421 


Below 




decapentaplegic Drosophila 












homolog 1 






0.6395 


Below 


14 38336_at 


KIAA1013 protein 


KIAA1013 


AB023230 


15 577_at 


midkine neurite growth-promoting 


MDK 


M94250 


0.6363 


Below 



WO 03/083140 



PCT/LS03/08486 



16 38671_at KIAA0620 protein 

17 33412_at LGALS1 Lectin, galactoside- 

binding, soluble, 1 

18 40451__at hypothetical protein FLJ2 1434 

19 36908_at Human macrophage mannose 

receptor (MRC1) gene, exon 30. 

20 963_at ligase IV DNA ATP-dependent 

21 41346_at like-glycosyltransferase 

22 32207_at membrane protein palmitoylated 1 

55kD 

23 2062_at insulin-like growth factor binding 

24 38408_at transmembrane 4 superfamily 

member 2 

25 854_at B lymphoid tyrosine kinase 

26 32193_at plexinCl 

27 35939_s_at POU domain class 4 transcription 

factor 1 

28 33705_at phosphodiesterase 4B cAMP- 

specific dunce Drosopliila homolog 
phosphodiesterase E4 

29 34168_at deoxynucleotidyltransferase 

terminal 

30 36383_at v-ets avian erythroblastosis virus 

E26 oncogene related 

31 38968_at SH3 -domain binding protein 5 

BTK-associated 

2 5 oligoadenylate synthetase 2 
actinin alpha 1 
CD2-associated protein 
protein kinase C eta 
tyrosylprotein sulfotransferase 2 
midkine neurite growtli-promoting 
factor 2 

tumor necrosis factor receptor 
superfamily member IB 



32 39263_at 

33 39329_at 

34 34699_at 

35 1267_at 

36 35172_at 

37 38124_at 

38 33813_at 

39 34176_at 

40 39424_at 



tumor necrosis factor receptor 



entry mediator 

41 40729_s_at nuclear factor of kappa light 

polypeptide gene enhancer in B- 
cells inhibitor-like 1 

42 32607_at brain acid-soluble protein 1 

43 38342_at KIAA023 9 protein 

44 32533_s_at vesicle-associated membrane 

protein 5 myobrevin 

45 39330_s_at actinin alpha 1 



KIAA0620 


AB014520 






LGALS1 


AI535946 


0.6351 


Above 


FLJ21434 


AL080203 


0.6350 


Below 


MRC1 


M93221 


0.6290 


Below 


LIG4 


X83441 


0.6282 


Below 


LARGE 


AJ007583 


0.6214 


Below 




M64925 


0.6155 


Below 


IGFBP7 


L19182 


0.6145 


Above 


TM4SF2 


L10373 


0.6137 


Below 


BLK 


S76617 


0.6075 


Above 


PLXNC1 


AF030339 


0.6065 


Above 


POU4F1 


L20433 


0.6046 


Below 


PDE4B 


L20971 


0.5991 




DNTT 


Ml 1722 


0.5979 


Below 


ERG 


M17254 


0.5976 


Below 


SH3BP5 


AB005047 


0.5976 


Below 


OAS2 


M87434 


0.5967 


Below 


ACTN1 


XI 5804 


0.5953 




CD2AP 


AL050105 


0.5945 


Below 


PRKCH 


M55284 


0.5941 


Below 


TPST2 


AF049891 


0.5937 


Below 


MDK 


X55110 


0.5936 


Below 


TNFRSF1B 


AI813532 


0.5934 


Below 


3 LOC57228 


AF091087 


0.5930 


Below 


TNFRSF14 

is 


U70321 


0.5930 


Below 


NFKBIL1 


Y14768 


0.5905 


Below 


BASP1 


AF039656 


0.5905 


Above 


KIAA0239 


D87076 


0.5896 


Below 


VAMP 5 


AF054825 


0.5880 


Below 


ACTN1 


M95178 


0.5867 


Below 
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46 40519_at protein tyrosine phosphatase PTPRC Y00638 0.5848 Above 

receptor type C 

47 39338_at S 100 calcium-binding protein A10 S100A10 AI201310 0.5844 Above 

annexin II ligand calpactin I light 
polypeptide pi 1 

48 35940_at POU domain class 4 transcription POU4F1 X64624 0.5824 Below 

factor 1 

49 39712_at S100 calcium-binding protein A13 S100A13 AI541308 0.5818 Below 

50 39379_at Homo sapiens mRNA cDNA AL049397 0.5811 Above 

DKFZp586C1019 from clone 
DKFZp586C1019 



Table 27: Genes Selected by Wilkins' for Novel Risk Group 



Affymetrix 


Gene Name 


Gene 


Reference 


Train set 




number 




Symbol 


number 


score 


Below 
Mean 




protein tyrosine phosphatase 


PTPRM 


X58288 


0.8668 


Above 




receptor type M 






0.8614 


Below 


2 41734 at 


KIAA0870 protein 


KIAA0870 


AB020677 


3 995_g_at 


protein tyrosine phosphatase 


PTPRM 


X58288 


0.8505 


Above 




receptor type M 






0.7694 


Above 


4 994_at 


protein tyrosine phosphatase 


PTPRM 


X58288 




receptor type M 






0.7399 


Below 


5 37967 at 


lymphocyte antigen 117 


LY117 


AF000424 


6 34676_at 


KIAA1099 protein 


KIAA1099 


AB029022 




ove 


7 41159_at 


Clathrin heavy polypeptide He 


CLTC 


D21260 


0.7283 


B^ow 


8 39728_at 


interferon gamma-inducible protein IFI30 


J03909 


0.7138 




9 37542_at 


lipoma HMGIC fusion partner-like 


LHFPL2 


D86961 


0.7069 


Above 


10 35350 at 


2 

B cell RAG associated protein 


BRAG 


AB011170 


0.7049 


Below 


11 41438_at 


KIAA1451 protein 


KIAA1451 


AL049923 


0.6999 


Below 


12 34370_at 


Archain 1 


ARCN1 


X81198 


0.6999 


Below 


13 36029_at 


chromosome 1 1 open reading frame 


: C110RF8 


U57911 


0.6964 


Above 


14 37960_at 


8 

carbohydrate chondroitin 6/keratan 


CHST2 


AB014679 


0.6947 


Above 




sulfotransferase 2 








Below 


15 35869_at 


MD-1 RP105-associated 


MD-1 


AB020499 


0.6908 


16 3660 l_at 


Vinculin 


VCL 


M33308 


0.6908 


Below 


17 40775_at 


Integral membrane protein 2A 


ITM2A 


AL021786 


0.6879 


Above 


18 37281_at 


KIAA0233 gene product 


KIAA0233 


D87071 


0.6837 


Below 


19 957_at 


Arrestin, beta 2 


ARRB2 


HG2059- 


0.6744 


Below 






HT2114 






20 33284_at 


myeloperoxidase 


MPO 


M19507 


0.6712 


Below 


21 40585_at 


adenylate cyclase 7 


ADCY7 


D25538 


0.6712 


Below 


22 37908_at 


guanine nucleotide binding protein 


GNG11 


U31384 


0.6656 


Above 


23 40167_s_at 


11 

CS box-containing WD protein 


LOC55884 


AF038187 


0.6581 


Below 


24 38576_at 


H2B histone family member B 


H2BFB 


AJ223353 


0.6576 


Below 


25 36591_at 


tubulin alpha 1 testis specific 


TUBA1 


X06956 


0.6576 


Below 
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26 37712_g_ at MADS box transcription enhance 

factor 2 polypeptide C myocyte 
enhancer factor 2C 

27 33924_at KIAA1091 protein 



29 33358_at 

30 33740_at 

31 36588_at 

32 38802_at 

33 38408_at 

34 32227_at 

35 34840_at 



37 33410_at 

38 38006_at 



phytanoyl-CoA hydroxylase 
Refsum disease 
EST (retina) 



KIAA08 10 protein 
progesterone binding protein 
transmembrane 4 superfamily 
member 2 

proteoglycan 1 secretory granule 
Homo sapiens cDNA FLJ22642 fis 
clone HSI06970 
mitogen-activated protein kinase 
kinase 2 
integrin alpha 6 

CD48 antigen B-cell membrane 



41 39781_at 

42 39893_at 

43 37326_at 

44 36687_at 

45 40423_at 

46 32542_at 

47 33232_at 

48 37280_at 

49 1325_at 

50 40729_s_at 



eukaryotic 
factor 4 gamma 3 

FK506 binding protein 12- 
rapamycin associated protein 1 



protein 4 

guanine nucleotide binding proteir 
G protein gamma 7 

proteolipid protein 2 colonic 
epithelium-enriched 

cytochrome c oxidase subunit VIII 
KIAA0903 protein 
four and a half LTM domains 1 
cysteine-rich protein 1 intestinal 
MAD mothers against 
decapentaplegic Drosophila 
homolog 1 

MAD mothers against 
decapentaplegic Drosophila 
homolog 1 

nuclear factor of kappa light 
polypeptide gene enhancer in B- 
cells inhibitor-like 1 





S57212 


0.6576 


Below 


KIAA1091 


AB029014 


0.6484 


Below 


PHYH 


AF023462 


0.6466 


Above 




W29087 


0.6457 


Above 


C10RF2 


AF023268 


0.6441 


Below 


KIAA0810 


AB018353 


0.6441 


Below 


HPR6.6 


Y12711 


0.6441 


Below 


TM4SF2 


L10373 






PRG1 


X17042 


0.6409 


Below 




AI700633 






MAP2K2 


L11285 


0.6409 


Below 


ITGA6 


S66213 


0.6391 


Above 


CD48 


M37766 


0.6342 


Below 


EIF4G3 


AF0 12072 






FRAP1 


AL046940 


0.6304 


Below 


IGFBP4 


U20982 


0.6301 


Below 


GNG7 


AB010414 


0.6301 


Below 


PLP2 


U93305 


0.6267 


Below 


COX7B 


N50520 


0.6266 


Below 


KIAA0903 


AB020710 


0.6254 


Above 


FHL1 


AF063002 


0.6236 


Below 


CRDP1 


AI017574 


0.6211 


Below 


MADH1 


U59912 


0.6208 


Above 


MADH1 


U59423 


0.6208 


Above 


NFKBIL1 


Y14768 


0.6199 


Below 
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Table 28. Genes selected by Wilkins' for T-ALL 



B cell linker protein 
CD79B antigen immiuioglobulin- 
associated beta 
CD 19 antigen 

T-cell leukemia/lymphoma 1A 
CD79A antigen immunoglobulin- 
associated alplia 

major histocompatibility complex 
class II DQ beta 1 

SH2 domain protein 1 A Duncan s 
disease lymphoproliferative 
syndrome 

B cell RAG associated protein 
mal T-cell differentiation protein 
CD24 antigen small cell lung 
carcinoma cluster 4 antigen 

CD22 antigen 

major histocompatibility complex 
class II DM alpha 

13 34033_s_at leukocyte immunoglobulin-like 

receptor subfamily A with TM 
domain member 2 

14 36638_at connective tissue growth factor 

15 38213_at 

16 41734_at 

17 37711_at 



1 38242_at 

2 37988_at 

3 1096_g_at 

4 39318_at 

5 38018_g_at 

6 36878_f_at 

7 38147_at 

8 35350_at 

9 3805 l_at 

10 266_s_at 

11 38521_at 

12 37344_at 



Gene 
Symbol 

SLP65 
CD79B 

CD19 

TCL1A 

CD79A 

HLA-DQB1 

SH2D1A 

BRAG 
MAL 
CD24 

CD22 
HLA-DMA 



Reference 
number 

AF068180 
M89957 

M28170 
X82240 
U05259 

M60028 

AL023657 

AB011170 

X76220 

L33930 

X59350 
X62744 



KIAA0870 protein 
MADS box transcription enhancer 
factor 2 polypeptide C myocyte 
enhancer factor 2C 

18 36239_at POU domain class 2 associating 

factor 1 

19 38319_at CD3D antigen delta polypeptide 

TiT3 complex 

20 38894_jj_at neutrophil cytosolic factor 4 40kD 

21 33705_at phosphodiesterase 4B cAMP- 

specific dunce Drosophila homolog 
phosphodiesterase E4 

22 38017_at CD79 A antigen immunoglobulin- 

associated alpha 

23 41156_g_at catenin cadherin-associated protein 

alpha 1 102kD 

24 38994_at STAT induced STAT inhibitor-2 

25 37710_at MADS box transcription enhancer 

factor 2 polypeptide C myocyte 
enhancer factor 2C 

26 41155_at catenin cadherin-associated protein 

alpha 1 102kD 

-86- 



CTGF 
GLA 

KIAA0870 
MEF2C 

POU2AF1 
CD3D 

NCF4 
PDE4B 

CD79A 

CTNNA1 

STATI2 
MEF2C 



X78947 
U78027 
AB020677 
S57212 

Z49194 
AA919102 

AL008637 
L20971 

U05259 
U03100 
AF037989 



Train set Above/ 

score Below 
Mean 

0.8683 Below 

0.8422 Below 

0.8181 Below 

0.8128 Below 

0.8127 Below 

0.8053 Below 

0.8016 Above 

0.7914 Below 

0.7900 Above 

0.7867 Below 

0.7856 Below 

0.7835 Below 

0.7761 Below 

0.7755 Below 

0.7701 Below 

0.7693 Below 

0.7560 Below 

0.7440 Below 

0.7426 Above 

0.7422 Below 

0.7414 Below 

0.7360 Below 

0.7315 Below 

0.7292 Below 

0.7283 Below 

0.7278 Below 
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27 40570 at 


forkhead box 0 1 A 


FOXOIA 


AF032885 


0.7258 






rhabdomyosarcoma 










28 34224_at 


fatty acid desaturase 3 


FADS3 


AC004770 


0.7254 


Below 


29 38604 at 


neuropeptide Y 


NPY 


AI198311 


0.7212 


Below 


30 36773 f at 




HLA-DQB1 


M81141 


0.7197 


Below 




class II DQ beta 1 










31 32562_at 


endoglin Osler-Rendu-Weber 


ENG 


X72012 


0.7180 


Below 




syndrome 1 










32 36502 at 


PFTAIRE protein kinase 1 


PFTK1 


AB020641 


0 7179 


Below 
eow 


jj j 1 1 ou_at 


phospholipase C gamma 2 




X14034 


0.7114 


Below 














34 38893_at 


neutrophil cytosolic factor 4 40kD 


NCF4 


AL008637 


0.7100 


Below 


35 387_at 


cyclin-dependent kinase 9 CDC2- 


CDK9 


X80230 


0.7024 


Below 




related kinase 










36 32035_at 


Human MHC class II HLA- 




M16942 


0.6992 


Below 




DRw53-associated glycoprotein 












beta- chain mRNA complete cds 










37 41153_f_at 


Homo sapiens alphaE-catenin 


CTNNA1 


AF102803 


0.6976 


Below 




(CTNNA1) gene 










38 40780_at 


C-terminal binding protein 2 


CTBP2 


AF0 16507 


0.6976 


Below 


39 40775_at 


integral membrane protein 2A 


ITM2A 


AL021786 


0.6952 


Above 


40 39402_at 


interleukin 1 beta 


IL1B 


M15330 


0.6945 


Below 


41 38522_s_at 


CD22 antigen 


CD22 


X52785 


0.6945 


Below 


42 41166_at 


immunoglobulin heavy constant mu IGHM 


X58529 


0.6941 


Below 


43 36937_s_at 


PDZ and LIM domain 1 elfin 


PDLIM1 


U90878 


0.6937 


Below 


44 38833_at 


Human mRNA for SB classll 




X00457 


0.6925 


Below 




histocompatibility antigen alpha- 










45 2047_s_at 


junction plakoglobin 


JUP 


M23410 


0.6920 


Below 


46 36277_at 


Human membran protein (CD3- 


CD3E 


M23323 


0.6899 


Above 




epsilon) gene, exon 9. 










47 40688 at 


linker for activation of T cells 


LAT 


AJ223280 






48 39389_at 


CD9 antigen p24 


CD9 


M38690 


0.6879 


Below 


49 33162_at 


Insulin receptor 


INSR 


X02160 


0.6879 


Below 


50 31891_at 


chitinase 3-like 2 


CHI3L2 


U58515 


0.6872 


Above 




Table 29. Genes Selected by Wilkins' lor TEL-AML1 






Affymetrix 


Gene Name 


Gene 


Reference 


Train set 


Above/ 


number 




Symbol 


number 


score 


Below 










Mean 


1 37780_at 


Piccolo presynaptic cytomatrix 


PCLO 


AB011131 


0.7121 


Above 




protein 










2 38203_at 


potassium intermediate/small 


KCNN1 


U69883 


0.7086 


Above 




conductance calcium-activated 











channel subfamily N member 1 
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3 36524_at 

4 38578_at 

5 32730_at 

6 34194_at 

7 40272_at 

8 41819_at 

9 1488_at 

10 35665_at 

11 35614_at 



Rho guanine nucleotide exchange 
factor GEF 4 

tumor necrosis factor receptor 
superfamily member 7 

Homo sapiens mRNA for KIAA1750 
protein partial cds 

Homo sapiens cDNA FLJ21697 fis 
clone COL09740 



typeK 

phosphoinositide-3 -kinase class 3 



loop-helix 

protein tyrosine phosphatase type IVA PTP4A3 



ARHGEF4 


AB029035 


0.6782 


Above 


TNFRSF7 


M63928 


0.6718 


Above 




AL080059 


0.6616 


Above 




AL049313 






CRMP1 

FYB 

PTPRK 


D78012 
U93049 
L77886 


0.6160 
0.6058 
0.6056 


Above 
Above 
Above 


PIK3C3 
TCFL5 


Z46973 
AB012124 


0.6022 
0.5983 


Above 
Above 



13 35362_at 

14 37908_at 

15 39329_at 

16 1936_s_at 



18 39389_at 

19 37343_at 

20 1299_at 

21 38652_at 

22 38763_at 

23 37724_at 

24 36937_s_at 

25 1325_at 



27 39827_at 

28 32724_at 

29 31786_at 

30 38570_at 



DKFZ P 434 
A202 



Actinin alpha 1 

proto-oncogene c-myc, alt. transcript 
3,ORF 114 

Homo sapiens mRNA cDNA 
DKFZp434A202 

CD9 antigen p24 CD9 
inositol 1 4 5 -triphosphate receptor ITPR3 
type 3 

telomeric repeat binding factor 2 TERF2 
hypothetical protein FLJ20 1 54 FLJ20 1 54 

(clone D2 1-1) L-iditol-2 
dehydrogenase gene 

v-myc avian myelocytomatosis viral MYC 
oncogene homolog 

PDZ and LIM domain 1 elfin PDLIM 1 

MAD mothers against MADH1 
decapentaplegic Drosophila homolog 
1 

adaptor-related protein complex 1 AP 1 S2 
sigma 2 subunit 

hypothetical protein FU20500 
phytanoyl-CoA hydroxylase Refsum PHYH 
disease 

Sam68-like phosphotyrosine protein T-STAR 
T-STAR 

major histocompatibility complex HLA-DOB 
class II DO beta 



0.5976 Above 



MYO10 


AB018342 


0.5964 


Above 


GNG11 


U31384 


0.5888 


Above 


ACTN1 


X15804 


0.5840 


Below 




HG3523- 


0.5761 


Below 




HT4899 







0.5725 Above 



31 39330_s_at actinin alpha 1 



ACTN1 



M38690 


0.5684 


Below 


U01062 


0.5642 


Above 


X93512 


0.5585 


Above 


AF070644 


0.5563 


Above 


L29254 


0.5535 


Below 


V00568 


0.5506 


Below 


U90878 


0.5506 


Below 


U59423 


0.5482 


Above 


AF091077 


0.5474 


Below 


AA522530 


0.5471 


Below 


AF023462 


0.5459 


Above 


AF051321 


0.5403 


Above 


X03066 


0.5384 


Above 


M95178 


0.5375 


Below 
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32 36493_at 

33 574_s_at 

34 32224_at 

35 1077_at 

36 37280_at 

37 41200_at 

38 36009_at 

39 36933_at 

40 1126_s_at 

41 39824_at 

42 38078_at 

43 38127_at 

44 32941_at 

45 37276_at 

46 34768_at 

47 39781_at 

48 37918_at 



49 41490_at 

50 41814_at 



M33552 
M87507 

AB018312 
M29474 
U59912 



lymphocyte-specific protein 1 LSP1 
caspase 1 apoptosis-related cysteine CASP1 
protease interleukin 1 beta convertase 

KIAA0769 gene product KIAA0769 
recombination activating gene 1 RAG1 
MAD mothers against MADH1 
decapentaplegic Drosophila homolog 
1 

CD36 antigen collagen type I receptor CD36L1 Z22555 
thrombospondin receptor like 1 
hypothetical protein CL683 
N-myc downstream regulated NDRG 1 

Human cell surface glycoprotein CD44 
CD44 (CD44) gene, 3' end of long 
tailed isoform. 
ESTs 

filamin B beta actin-binding protein- FLNB 
278 

syndecan 1 SDC1 
interferon consensus sequence ICSBP1 
binding protein 1 



AF091092 
D87953 
L05424 

AI391564 
AF042166 

Z48199 
M91196 



IQ motif containing GTPase 
activating protein 2 

DKFZP564E1962 protein 

insulin-like growth factor-binding 
protein 4 

integrin beta 2 antigen CD 18 p95 
lymphocyte function-associated 
antigen 1 macrophage antigen 1 mac- 
1 beta subunit 

phosphoribosyl pyrophosphate 
synthetase 2 

fucosidase alpha-L- 1 tissue 



IQGAP2 U51903 



DKFZP564 

E1962 

IGFBP4 



AL080080 

U20982 

M15395 

Y00971 
M29877 



0.5356 Below 

0.5336 Below 

0.5326 Above 

0.5302 Above 

0.5283 Above 

0.5261 Above 

0.5259 Below 

0.5254 Below 

0.5232 Below 

0.5231 Above 

0.5208 Below 

0.5199 Above 

0.5195 Below 

0.5191 Below 

0.5184 Below 

0.5173 Below 

0.5162 Below 



0.5155 Below 
0.5101 Above 



5. SOM/DAV 

The 10,991 probe sets that passed the variation filter were used for subsequent 
selection of discriminating genes using the self-organizing map (SOM) and 

5 discriminant analysis with variance (DAV) programs in the GeneMaths software 

package (version 1 .5, Applied Maths, Belgium). The subgroups for which genes were 
selected included T-lineage ALL, TEL-AML1, E2A-PBX1,MLL rearrangement, BCR- 
ABL, hyperdiploid ALL (chromosomal number > 50) and the novel subgroup 
described in the text of the paper. The target number of total genes chosen by each 

10 algorithm was 500. 
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The SOM analysis was performed using 30 X 18 node format to enable an 
optimal number of genes per node (-20 genes per node). Nodes that contained genes 
whose expression varied more than 2-fold from the mean in more than 70% of the 
samples in a particular subgroup were chosen. A total of 451 genes were chosen 
5 using the SOM algorithm and 443 genes using the DAV algorithm. The combined 
gene sets contained 755 unique genes, of which 185 were present in both subsets. 2-D 
hierarchical clustering of the genes and samples were performed using Pearson's 
correlation coefficient as the metric and unweighted pair group method using 
arithmetic averages (UPGMA). Approximately 10% of the genes that were found to 

10 have correlation coefficients less than 0.7 in each branch of the dendrogram were 

removed and the process was repeated reiteratively until the correlation coefficient for 
all genes within a branch was > 0.7, or until the removal of additional gene resulted in 
a deterioration of the class distinction as indicated by inappropriate clustering of 
cases. Through this approach a subset of 215 genes were selected that optimally 

15 separated the 7 subgroups. These genes are listed in Tables 30-36. The selection of 
genes by this approach does not provide for a ranking. For class prediction between 
20 and 30 genes were used for each genetic subgroup, unless otherwise stated. 



Table 30. Genes selected by DAV-SOM for BCR-ABL 



Affymetrix 


Gene Name 


GeneSymbol 


Reference 


Above/ 


number 






number 


Below 










Mean 


1 39250_at 


nephroblastoma overexpressed gene 


NOV 


X96584 


Above 


2 37600_at 


extracellular matrix protein 1 


ECM1 


U68186 


Above 


3 38312_at 


DKFZp5640222 from clone 




AL050002 


Above 




DKFZp5640222 








4 38342_at 


KIAA0239 protein 


KIAA0239 


D87076 


Above 


5 39712_at 


S100 calcium-binding protein A13 


S100A13 


AI541308 


Above 


6 39730_at 


v-abl Abelson murine leukemia viral 


ABL1 


XI 6416 


Above 




oncogene homolog 1 








7 3978 l_at 


Insulin-like growth factor-binding prote 


lin IGFBP4 


U20982 


Above 


8 4005 l_at 


4 

TRAM-like protein 


KIAA0057 


D31762 


Above 


9 40504_at 


paraoxonase 2 


PON2 


AF001601 


Above 


10 33362_at 


Cdc42 effector protein 3 


CEP3 


AF094521 


Above 


11 33404_at 


adenylyl cyclase-associated protein 2 


CAP2 


U02390 


Above 


12 34362_at 


solute carrier family 2 facilitated glucose SLC2A5 


M55531 


Above 




transporter member 5 








13 36591__at 


Tubulin alpha 1 testis specific 


TUBA1 


X06956 


Above 
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14 38077__at 

15 40196_at 

16 1911_s_at 

17 1702_at 

18 1635 at 



Affymetrix 
number 

1 33513_at 

2 37479_at 

3 37485_at 

4 39614_at 

5 39929_at 

6 40648_at 

7 41017_at 

8 41425_at 

9 41862_at 

10 32063_at 

11 37225_at 

12 38285_at 

13 38286„at 

14 38340_at 

15 39379_at 

16 39402_at 

17 40454_at 

18 41139_at 

19 41146_at 

20 33355_at 



collagen type VI alpha 3 


COL6A3 


X52022 


Above 


HYA22 protein 


HYA22 


D88153 


Above 


Growth arrest and DNA-damage- 


GADD45A 


M60974 


Above 


inducible alpha 








interleukin 2 receptor alpha 


IL2RA 


X01057 


Above 


Human proto-oncogene tyrosine-protein 


ABL 


U07563 


Above 


kinase (ABL) gene, exon la and exons 2- 








10, complete cds. 








Human proto-oncogene tyrosine-protein 


ABL 


U07563 


Above 


kinase (ABL) gene, exon la and exons 2- 








10, complete cds. 








Caspase 10 apoptosis-related cysteine 


CASP10 


U60519 


Above 


protease 








Tubulin, alpha 1, isoform44 


TUBA1 


HG2259- 
HT2348 


Above 






Table 31. Genes selected by DAV-SOM for E2A-PBX1 




Gene Name 


GeneSymbol 


Reference 


Above/ 






number 


Below 








Mean 


signaling lymphocytic activation molecule 


SLAM 


U33017 


Above 


CD72 antigen 


CD72 


M54992 


Above 


fatty-acid-Coenzyme A ligase very long- 


FACVL1 


JJS03US 


Above 


chain 1 








KIAA0802 protein 


KIAA0802 


ABO 18345 


Above 


KIAA0922 protein 


KIAA0922 


AB023139 


Above 


c-mer proto-oncogene tyrosine kinase 


MERTK 


U08023 


Above 


Myosin-binding protein H 


MYBPH 


U27266 


Above 


Friend leukemia virus integration 1 


flu 


M98833 


Above 


KIAA0056 protein 


KIAA0056 


D29954 


Above 


pre-B-cell leukemia transcription factor 1 


PBX1 


M86546 


Above 


KIAA0172 protein 


KIAA0172 


D79994 


Above 


mu-crystallin gene 




AF039397 


Above 


KIAA1071 protein 


KIAA1071 


AB028994 


Above 


huntingtin interacting protein- 1 -related 


KIAA0655 


AB014555 


Above 


cDNA DKFZp586C1019 from clone 




AL049397 


Above 


DKFZp586C1019 








interleukin 1 beta 


IL1B 


M15330 


Above 


FAT tumor suppressor Drosophila homolog FAT 


X87241 


Above 


melanoma antigen family D 1 


MAGED1 


W26633 


Above 


ADP-ribosyltransferaseNAD poly ADP- 


ADPRT 


J03473 


Above 


ribose polymerase 








Homo sapiens cDNAFLJ12900 fis clone 




AL049381 


Above 


NT2RP2004321 








BUB3 budding uiiinhibited by 


BUB 3 


AF047473 


Above 


benzimidazoles 3 yeast homolog 
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22 36179_at 


mitogen-activated protein kinase-activated 
protein kinase 2 


MAPKAPK2 


U12779 


Above 


23 36589_at 


aldo-keto reductase family 1 member Bl 
aldose reductase 


AKR1B1 


X15414 


Above 


24 38393_at 


KIAA0247 gene product 


KIAA0247 


D87434 


Above 


25 38438_at 


Nuclear factor of kappa light polypeptide 
gene enhancer in B-cells 1 pl05 


NFKB1 


M58603 


Above 


26 1786_at 


c-mer proto-oncogene tyrosine kinase 


MERTK 


U08023 


Above 


27 1520_s_at 


interleukin 1 beta 


IL1B 


X04500 


Above 


28 1287_at 


ADP-ribosyltransferase NAD poly ADP- 
ribose polymerase 


ADPRT 


J03473 


Above 


29 854_at 


B lymphoid tyrosine kinase 


BLK 


S76617 


Above 


30 753_at 


Nidogen 2 


NID2 


D86425 


Above 


31 430_at 


nucleoside phosphorylase 


NP 


X00737 


Above 


32 362_at 


Protein kinase C zeta 


PRKCZ 


Z15108 


Above 




Table 32. Genes selected by DAV/SOM for Hyperdiploid >50 




Affymetrix 
number 

1 36795_at 


Gene Name 

prosaposin valiant Gaucher disease and 
valiant metachromatic leukodystrophy 


GeneSymbol 
PSAP 


Reference 

J03077 


Above/ 
Below 
Mean 

Above 


2 38242_at 


B cell linker protein 


SLP65 


AF068180 


Above 


3 38518_at 


sex comb on midleg Drosophila like 2 


SCML2 


Y18004 


Above 


4 39628_at 


RAB9 member RAS oncogene family 


RAB9 


U44103 


Above 


5 31863_at 


KIAA0179 protein 


KIAA0179 


D80001 


Above 


6 33228_g_at 


interleukin 10 receptor beta 


IL10RB 


AI984234 


Above 


7 33753_at 


KIAA0666 protein 


KIAA0666 


AB014566 


Above 


8 37543_at 


Rac/Cdc42 guanine exchange factor GEF 6 


ARHGEF6 


D25304 


Above 


9 38968_at 


SH3 -domain binding protein 5 BTK- 
associated 


SH3BP5 


AB005047 


Above 


10 39039_s_at 


CGI-76 protein 


LOC51632 


AI557497 


Above 


11 39329_at 


Actinin alpha 1 


ACTN1 


XI 5804 


Above 


12 39389_at 


CD9 antigen p24 


CD9 


M38690 


Above 


13 32207_at 


membrane protein palmitoylated 1 55kD 


MPP1 


M64925 


Above 


14 32236_at 


ubiquitin-conjugating enzyme E2G 2 
homologous to yeast UBC7 


UBE2G2 


AF032456 


Above 


15 3225 l_at 


hypothetical protein FLJ21174 


FLJ21174 


AA149307 


Above 


16 35764_at 


chromosome X open reading frame 5 


OFD1 


Y15164 


Above 


17 36620_at 


superoxide dismutase 1 soluble 
amyotrophic lateral sclerosis 1 adult 


SOD1 


X02317 




18 36937_s_at 


PDZ and LIM domain 1 elfin 


PDLIM1 


U90878 


Above 


19 37326_at 


proteolipid protein 2 colonic epithelium- 
enriched 


PLP2 


U93305 


Above 
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22 39168__at 

23 40903_at 



25 1065_at 

26 306_s_at 



clone 889N15 on chromosome Xq22.1- PSMD10 
22.3 . Contains part of the gene for a novel 
protein similar to X. laevis Cortical 
Thymocyte Marker CTX 

SMT3 suppressor of mif two 3 yeast SMT3H1 
homolog 1 

Ac-like transposable element ALTE 
ATPase H transporting lysosomal vacuolar APT6M8-9 
proton pump membrane sector associated 
protein M8-9 

ubiquitin specific protease 9 X chromosome USP9X 
Drosophila fat facets related 



fms-related tyrosine kinase 3 
high-mobility group nonhistone 
chromosomal protein 14 



FLT3 
HMG14 



ABO 18328 
AL049929 



U02687 
J02621 



Above 
Above 



Above 
Above 



Affymetrix 
number 

1 31492_at 

2 36777_at 

3 39301_at 

4 41448_at 

5 39424_at 

6 40076_at 

7 40493_at 



Table 33: Genes selected by DAV/SOM for MLL 
Gene Name GeneSymbol Reference 



Muscle specific gene M9 

DNA segment on chromosome 12 unique D12S2489E 
2489 expressed sequence 

Calpain3p94 CAPN3 

HomeoboxA4 HOXA4 

tumor necrosis factor receptor superfamily TNFRSF14 
member 14 herpesvirus entry mediator 

Tumor protein D52-like 2 TPD52L2 
Human cell surface glycoprotein CD44 CD44 
(CD44) gene, 3' end of long tailed isoform. 

8 40506_s_at Homo sapiens polyadenylate binding 

protein mRNA, complete cds. 

9 40514_at hypothetical 43.2 Kd protein 

1 0 40763_at Meis 1 mouse homolog 

1 1 40797_at a disintegrin and metalloproteinase domain 

10 

12 40798_s_at a disintegrin and metalloproteinase domain 

13 41747_s_at 



myocyte-specific enhancer factor 2A 
(MEF2A) gene 

14 32193_at PlexinCl 

15 32215_i_at KIAA0878 protein 

16 334 12_at LGALS 1 Lectin, galactoside-binding, 

soluble, 1 (galectin 1) 

17 34306_at muscleblind Drosophila like 

18 34785_at KIAA1025 protein 
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LOC51614 
MEIS1 
ADAM10 

ADAM10 

MEF2A 

PLXNC1 
KIAA0878 
LGALS 1 

MBNL 
KIAA1025 



AB019392 
AJ001687 

X85030 

AC004080 

U70321 

AF004430 
L05424 



AF091085 

U85707 

AF009615 

Z48579 

U49020 

AF030339 
AB020685 
AI535946 

AB007888 
AB028948 



Mean 

Above 
Above 

Below 
Above 
Below 

Above 
Above 



Above 
Above 
Above 

Above 

Above 

Above 
Above 
Above 

Above 
Above 
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19 35298_at 


eukaryotic translation initiation factor 3 


EIF3S7 


U54558 


Above 




subunit 7 zeta 66/67kD 








20 36690 at 


Nuclear receptor subfamily 3 group C 
member 1 


NR3C1 


M10901 


Above 


21 37675_at 


solute carrier family 25 mitochondrial 


SLC25A3 


X60036 


Above 














capping protein actin filament gelsolin-like 


CAPG 


M94345 


Above 


23 38413_at 




DAD1 


D 15057 


Above 


24 39110_at 




EIF4B 


X55733 




25 39867 at 


Tu^Dskti^Td^ti^fector 


TUFM 


S75463 


Above 


26 2062_at 


mitochondrial 

Insulin-like growth factor binding protein 7 


IGFBP7 


L19182 


Above 


27 2036_s_at 


CD44 antigen homing function and Indian 


CD44 


M59040 


Above 




blood group system 








28 1914_at 


CyclinAl 


CCNA1 


U66838 


Above 


29 1327_s_at 


mitogen-activated protein kinase kinase 


MAP3K5 


U67156 


Above 




kinase 5 








30 1126_s_at 


Human cell surface glycoprotein CD44 


CD44 


L05424 


Above 




(CD44) gene, 3' end of long tailed isoform. 








31 1102_s_at 


Nuclear receptor subfamily 3 group C 


NR3C1 


M10901 


Above 




member 1 








32 873_at 


homeo box A5 


HOXA5 


M26679 


Above 


33 706_at 


Glucocorticoid receptor, beta 




HG4582- 








HT4987 




34 657_at 


protocadherin gamma subfamily C 3 


PCDHGC3 


LI 1373 


ove 




Table 34. Genes selected by DAV/SOM for Novel Class 




Affymetrix 


Gene Name 


GeneSymbol 


Reference 


Above/ 


number 






number 












Mean 


1 33137_at 


latent transforming growth factor beta 


LTBP4 


Y 13622 


Above 




binding protein 4 








2 3808 l_at 


leukotriene A4 hydrolase 


LTA4H 


J03459 


Above 


3 38661_at 


seb4D 


HSRNASEB 


X75314 


Above 


4 39878_at 


protocadherin 9 


PCDH9 


AI524125 


Above 


5 35260_at 


KIAA0867 protein 


MONDOA 


AB020674 


Above 


6 1373_at 


transcription factor 3 E2A immunoglobulin 


TCF3 


M31523 


Above 




enhancer binding factors E12/E47 








7 35177_at 


KIAA0725 protein 


KIAA0725 


AB018268 


Above 


8 38618_at 


Human PAC clone RP3-515N1 from 


LIMK2 


AC002073 


Above 




22qll.2-q22 








9 34947_at 


phorbolin-like protein MDS019 


MDS019 


AA442560 


Above 


10 40692_at 


transducin-like enhancer of split 4 homolog TLE4 


M99439 


Above 




of DrosophilaEspl 








11 38364_at 


BCE-1 protein 


BCE-1 


AF068197 


Above 


12 37960_at 


carbohydrate chondroitin 6/keratan 


CHST2 


ABO 14679 


Above 




sulfotransferase 2 
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13 994_at Protein tyrosine phosphatase receptor type PTPRM 

M 

14 31 892_at Protein tyrosine phosphatase receptor type PTPRM 

M 

15 995_g_at Protein tyrosine phosphatase receptor type PTPRM 

M 

16 41073_at G protein-coupled receptor 49 GPR49 

17 41708_at KIAA1034 protein KIAA1034 

18 3437 6_at protem kinase cAMP-dependent catalytic PKIG 

inliihitor gamma 

19 37978_at quinolinate phosphoribosyltransferase QPRT 

nicotinate-nucleotide pyrophosphorylase 
carboxylaring 



X58288 


Above 


X58288 


Above 


X58288 


Above 


AI743745 


Above 


AB028957 


Above 


AB019517 


Below 



20 38717_at DKFZP586A0522 protein 

2 1 33999_f_at Human L2-9 transcript of unrearranged 

immunoglobulin V H 5 pseudogene 

22 3618 l_at LIM and SH3 protein 1 

23 41202_s_at conserved gene amplified in osteosarcoma 

24 4 1 13 8_at Antigen identified by monoclonal 

antibodies 12E7 F21 and 013 

Moesin 

singed Drosophila like sea urchin fascin 
homolog like 



DKFZP586A05 AL050159 
22 

X58398 



25 4077 l_at 

26 39070_at 



LASP1 

OS4 

MIC2 

MSN 
SNL 



X82456 
AF000152 
Ml 6279 

Z98946 
U03057 



Below 
Above 

Below 
Above 
Below 

Above 
Below 



27 32562_at 


endoglin Osler-Rendu- Weber syndrome 1 


ENG 


X72012 


Below 


28 36536_at 


schwarmornin interacting protein 1 


SCHIP-1 


AF070614 


Below 


29 36650_at 


cyclin D2 


CCND2 


D 13639 


Below 


30 39756_g_at 


X-box binding protein 1 


XBP1 


Z93930 


Above 


31 34168_at 


deoxynucleotidyltransferase teraiinal 


DNTT 


Ml 1722 


Above 


32 1389_at 


membrane metallo-endopeptidase neutral 


MME 


J03779 


Below 




endopeptidase enkephalinase CALLA 










CD 10 








33 41213_at 


peroxiredoxin 1 


PRDX1 


X67951 


Above 


34 3657 l_at 


Topoisomerase DNA II beta 180kD 


TOP2B 


X68060 


Above 


35 253_g_at 


clone GPCR W G protein-linked receptor 




L42324 


Below 




gene (GPCR) gene, 5' end of cds. 








36 252_at 


clone GPCR W G protein-linked receptor 




L42324 


Above 




gene (GPCR) gene, 5' end of cds. 








37 2087_s_at 


cadherin 1 1 type 2 OB-cadherin osteoblast 


CDH11 


D21254 


Above 


38 36976_at 


cadherin 1 1 type 2 OB-cadherin osteoblast 


CDH11 


D21255 


Above 




Table 35. Genes selected by DAV/SOM for T-ALL 




Affymetrix 


Gene Name 


GeneSymbol 


Reference 


Above/ 


number 






number 


Below 










Mean 


1 35016_at 


Human la-associated invariant gamma- 




M13560 


Below 




chain gene, exon 8, clones lambda-y(l,2,3). 








2 36277_at 


membrane protein (CD3-epsilon) gene 


CD3E 


M23323 


Above 
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4 38949_at 

5 32649_at 



SH2 domain protein 1A Duncan s disease 
lymphoproliferative syndrome 

protein kinase C theta 

transcription factor 7 T-cell specific HMG- 



PRKCQ 
TCF7 



L01087 
X59871 



Above 
Above 



7 35643_at 

8 36473_at 

9 38319_at 

10 39709_at 

11 40775_at 

12 32794_g_at 

13 37039_at 

14 3805 l_at 

15 38095_i_at 

16 38096_f_at 

17 38415_at 

18 38833_at 

19 2059_s_at 

20 1241_at 



Human T-lymphocyte specific protein 
tyrosine kinase p56lck (LCK) aberrant 
mRNA, complete cds. 

nucleobindin 2 

ubiquitin specific protease 20 

CD3D antigen delta polypeptide TiT3 

complex 

selenoprotein W 1 

integral membrane protein 2A 

T cell receptor beta locus 

major histocompatibility complex class II 

DR alpha 

mal T-cell differentiation protein 

major histocompatibility complex class II 

DP beta 1 



major histocompatibility complex class II 
DP beta 1 

protein tyrosine phosphatase type IVA 
member 2 

Human mRNA for SB classll 
histocompatibility antigen alpha-chain 

lymphocyte-specific protein tyrosine kinasi 
protein tyrosine phosphatase type IVA 
member 2 



NUCB2 
USP20 
CD3D 

SEPW1 
ITM2A 
TRB 

HLA-DRA 
MAL 

HLA-DPB1 
HLA-DPB1 
PTP4A2 



5 LCK 
PTP4A2 



X76732 

AB023220 

AA919102 

U67171 
AL021786 
X00437 
J00194 

X76220 
M83664 

M83664 

U14603 

X00457 

M36881 
U14603 



Above 
Above 
Above 

Above 
Above 
Above 



Above 
Below 

Below 

Above 

Below 

Above 
Above 



21 1105_s_at T cell receptor beta locus 



Table 36: Genes selected by DAV/SOM for TEL-AML1 



Affymetrix Gene Name 
number 



GeneSymbol Reference 
number 



1 31508_at upregulated by 1, 25-dmydroxyvitainin D-3 VDUP1 

2 33690_at cDNA DKFZp434A202 from clone 

DKFZp434A202 

3 3448 l_at vav proto-oncogene, exon 27, and complete VAV 

cds. 

4 36239_at POU domain class 2 associating factor 1 POU2AF1 

5 37470_at Leukocyte-associated Ig-like receptor 1 LATR1 

6 38203_at Potassium intermediate/small conductance KCNN1 

calcium-activated channel subfamily N 
1 



S73591 
AL080190 



Z49194 
AF0 13249 



Above/ 
Below 
Mean 

Above 
Above 



Above 
Above 
Above 
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7 38570_at major histocompatibility complex class II HLA-DOB 

DO beta 

8 38578_at tumor necrosis factor receptor sup erfamily TNFRSF7 

member 7 

9 38906_at spectrin alpha erythrocytic 1 elliptocytosis SPTA1 

2 

1 0 40729_s_at nuclear factor of kappa light polypeptide NFKBIL1 

gene enhancer in B-cells inhibitor-like 1 

1 1 40745_at adaptor-related protein complex 1 beta 1 AP1B1 

subunit 

telomeric repeat binding factor 2 TERF2 
KIAA0308 protein KIAA0308 
core-binding factor runt domain alpha CBFA2T3 
subunit 2 translocated to 3 

KIAA0212 gene product KIAA0212 
KIAA0342 gene product KIAA0342 
cDNA FLJ21697 fis clone COL09740 
transcription factor-like 5 basic helix-loop- TCFL5 
helix 

Phosphoinositide-3 -kinase class 3 
protein tyrosine phosphatase type IVA 



12 41097_at 

13 41381_at 

14 41442_at 

15 31898_at 

16 32660_at 

17 34194_at 

18 35614_at 

19 35665_at 

20 36008_at 

21 36524_at 

22 36537_at 

23 37280_at 

24 38652_at 

25 41200_at 

26 32224_at 

27 36985_at 

28 38124_at 

29 39824_at 

30 40570_at 

31 41498_at 

32 41814_at 

33 32579_at 



Rlio guanine nucleotide exchange factor 
GEF4 

Rlio-specific guanine nucleotide exchange 
factor pi 14 

MAD mothers against decapentaplcgic 
a homolog 1 



hypothetical protein FLJ20154 
CD36 antigen collagen type I receptor 
thrombospondin receptor like 1 

KIAA0769 gene product 
isopentenyl-diphosphate delta isomerase 
midkine neurite growth-promoting factor 2 
ESTs 

forkhead box OlA rhabdomyosarcoma 
KIAA0911 protein 
fucosidase alpha-L- 1 tissue 
SWI/SNF related matrix associated actin 
dependent regulator of chromatin subfamily 
a member 4 

34 33162_at insulin receptor INSR 

35 1779_s_at pim-1 oncogene PIM1 

36 1488_at protein tyrosine phosphatase receptor type PTPRK 



PIK3C3 
PTP4A3 

ARHGEF4 

P114-RHO- 
GEF 

MADH1 

FLJ20154 
CD36L1 

KIAA0769 
IDI1 
MDK 

FOXOIA 
KIAA0911 
FUCA1 
SMARCA4 



X03066 

M63928 

M61877 
Y14768 

L13939 

AF002999 
AB002306 
AB010419 

D86967 
AB002340 
AL049313 
AB012124 

Z46973 
AF041434 

AB029035 

ABO 11093 

U59912 

AF070644 
Z22555 

ABO 183 12 

X17025 

X55110 

AI391564 

AF032885 

AB020718 

M29877 

D26156 



X02160 
M16750 
L77886 



Above 
Above 



Above 
Above 
Above 

Above 
Above 
Above 
Above 

Above 
Above 

Above 

Above 

Above 

Above 
Above 

Above 
Above 
Above 
Above 
Above 
Above 
Above 
Above 



Above 
Above 
Above 
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37 1325_at 


MAD mothers against decapentaplegic 
Drosophila homolog 1 


MADH1 


U59423 


Above 


38 1336_s_at 


protein kinase C beta 1 


PRKCB1 


X06318 


Above 


39 1299_at 


Telomeric repeat binding factor 2 


TERF2 


X93512 


Above 


40 1217_g_at 


protein kinase C beta 1 


PRKCB1 


X07109 


Above 


41 1077_at 


recombination activating gene 1 


RAG1 


M29474 


Above 


42 932_i_at 


zinc finger protein 91 HPF7 HTF10 


ZNF91 


LI 1672 


Above 


43 880_at 


FK506-binding protein 1A 12kD 


FKBP1A 


M34539 


Above 


44 755_at 


inositol 1 4 5-triphosphate receptor type 1 


ITPR1 


D26070 


Above 


45 577_at 


midkine neurite growm-promoting factor 2 


MDK 


M94250 


Above 


46 160029_at 


protein kinase C beta 1 


PRKCB1 


X07109 


Above 



C. Comparison of genes selected by the different metrics . 

There is a high degree of overlap between the genes chosen by the various 
5 metrics, however the top ranked genes for each metric differ. Despite this, the top 
genes selected by the various metrics are all able to accurately identify the leukemia 
risk groups as detailed below. As a result, a limited number of genes can be used to 
accurately identify the genetic subtypes and one can use non-overlapping lists and still 
achieve high prediction accuracy. Thus, there are many genes that are distinct 
10 discriminators of these seven risk groups, and one need only to use a small subset of 
these in a supervised learning algorithm to accurately identify a case as belonging to 
the genetic subtype. 

D. Decision tree for the diagnosis of genetic subtypes 

15 Classification was approached using a decision tree fonnat, in which the first 

decision was T-ALL versus B-lineage (non-T-ALL). Within the B-lineage subset, 
cases were then sequentially classified into the known risk groups characterized by 
the presence of E2A-PBX1, TEL-AML1, BCR-ABL, MLL chimeric genes, and lastly 
hyperdiploid >50 chromosomes. Cases not assigned to one of these classes were left 

20 unassigned. Classification was performed using the supervised learning algorithms 
described below. 

E. Description of Supervised Learning Algorithms 

An analysis of the profiles was performed using alinear classifier, C4.5, and a variety 
25 of different non-linear classifiers. The non-linear classifiers consistently outperformed 
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the linear classifier. Therefore, only the description and data from non-linear 
classifiers are included below. 

1 . Support Vector Machine (SVM) 

5 Support vector machine (SVM) selects a small number of critical boundary 

instances from each class and builds a linear discriminant function that separates them 
as widely as possible (Witten and Frank, Data Mining: Practical Machine Learning 
Tools and Techniques with Java Implementation, Morgan Kaufmann, 1999, herein 
incorporated by reference). In the case where no linear separation is possible, the 

10 technique of "kernel" is used to automatically inject the training instances into a 

higher dimensional space and a separator is learned in that space. The Weka version 
of SVM developed at the University of Waikato of New Zealand 
(www.cs.waikato.ac.nz/ml/weka), which implements Piatt's sequence minimal 
optimization algorithm for training a support vector classifier using polynomial 

15 kernels was used (Piatt, "Fast Training of Support Vector Machines Using Sequential 
Minimal Optimization," Advances in Kernel Methods — Support Vector Learning, 
Schlkpof et al, eds., MIT Press, 1998, herein incorporated by reference). 

2. Prediction by Collective Likelihood of Emerging Patterns (PCL) 
20 Emerging patterns (EPs) are a notion used in data mining to discover sharp 

differences between two classes of data (Dong and Li, "Efficient Mining of Emerging 
Patterns: Discovering Trends and Differences," Proc. 5th ACM SIGKDD 
International Conference on Knowledge Discoveiy and Data Mining, pp. 43-52 
(1999), herein incorporated by reference). An EP is a pattern — the expression level of 

25 several genes in our case — whose frequency increases significantly from one class of 
samples to another class. In particular, the most general patterns that have infinite 
growth in the sense that their frequency in one class is 0% and in another class is 
greater than 0% and none of their proper subpatterns are EPs were identified. These 
EPs can then be combined into reliable rules for subtype prediction. Three earlier 

30 methods for classification based on EPs are JEP(Li et al. (2001) Knowledge and 
Information System 3:131-45, herein incorporated by reference), DeEPs (Li et al, 
"DeEPs: Instance-based Classification by Emerging Patterns," Proc. 4th European 
Conference on Principles and Practice of Knowledge Discoveiy in Databases, pp. 
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191-200, 2000, herein incorporated by reference), and CAEP (Dong et al, "CAEP: 
Classification by Aggregation Emerging Patterns," Proc. 2nd International 
Conference on Discovery Science, pages 30-42, 1999, herein incorporated by 
reference). 

5 In this analysis an original variation in the spirit of JEP but with a different 

manner of aggregating EPs was used. Given two training data sets D p and D n and a 
testing sample T, the first phase was to discover EPs from D p and D n . Denote the EPs 
of Dp, in descending order of frequency, as TopEP p i, . . ., TopEP p i, and those of D n as 
TopEP n i, . . ., TopEP". Suppose T contains the following EPs of D p : TopEP P j/, . . ., 

10 TopEP 1 *, where ii <i2< ... < ix <= i; and the following EPs of D n : TopEP'V, 

TopEP"jy, where j 1 < ]2 < . . . < j y <= j - In the next step, two scores were calculated for 
T: scorep = Z[frequency(TopEP p im )/frequency(TopEP p m )] and score„ = 
S[frequency(TopEP n jm )/frequency(TopEP n m )], summing over m = l..k, where k « i 
and k « j . In this case, k is chosen to be 25. Finally, a prediction is made on T as 

1 5 follows: If scorep > score,,, then T is predicted to be in class D p ; otherwise, it is 
predicted as class D n . 

The spirit of this variation is to measure how far the top k EPs contained in T 
are away from the top k EPs of a class. For example, if k = 1, then scorep indicates 
whether the number-one EP contained in T is far from the most frequent EP of Dp. If 

20 the score is the maximum value 1, then the "distance" is very close, namely the most 
common property of D p is also present in this testing sample. With smaller scores, the 
distance becomes further and the likelihood of T belonging to D p becomes weaker. 
Using more than one top-ranked EPs in this way leads to very reliable predictions. 
This variation of EP-based classification method was termed "prediction by 

25 collective likelihood of EPs" or PCL for short. 

3 . ^-Nearest Neighbor (/r-NN) 

k-NN is a typical instance-based learner where the class of a new instance is 
decided by the majority class of its k closest neighbors (Cover and Hart (1967) IEEE 
3 0 Transactions on Information Theoiy 1 3 :21 -27, herein incorporated by reference). 
This method was used with the Euclidean distance metric. Conceptually, this is one 
of the most straightforward methods and is often used as a baseline for comparison 
purposes. The data were normalized using the z-score method, then the "best" few 
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genes were chosen using one of the statistical gene selection methods. For these 
experiments, the "top n" genes, where n= 1-50, were used. The expression values of 
the top genes from each diagnostic sample were treated as a vector in ^-dimensional , 
space. To classify a new sample, the same top n genes were chosen, and the 
5 Euclidean distance was computed between this new vector and each vector in the 
training data. The prediction was made by a majority vote of the k nearest samples, 
where k=\ or k=3. ha this experiment, k was set to 1. 

4. Artificial Neural Network (ANN) 

1 0 The artificial neural network (ANN) learning models built are all feed- 

forward, fully connected, and non-recurrent. The input layer of each ANN contains 
50 units, which correspond to the 50 input values (the "top 50" scoring genes). Each 
ANN has one hidden layer with 4 units, and an output layer that contains two units, 
which represent the two class labels. In a preprocessing step all input data was 

1 5 normalized using the z-score method. The apparent error was estimated using 3-fold 
cross-validation. That is, for each training procedure, the training samples were 
randomly shuffled and divided into three groups of approximately equal size. A 
model was built with two of the groups and the third group was set aside for 
validation. This step was repeated three times, each time with a different group for 

20 validation. This shuffling-training process was repeated ten times, resulting in 30 
ANN models. Each test sample was fed into each of the 30 ANN models, and the 
output was the average of the 30 outputs. The class predicted was the one that was 
represented by the output unit with the larger average output value. 

25 F. Table of results using the different algorithms to predict the genetic subgroups 
A summary of the true prediction accuracy on the blinded test set of 1 12 cases 
are presented in Tables 37-39. Sensitivity was calculated as the number of positive 
samples predicted /the number of true positives. Specificity was calculated as the 
number of negative samples predicted/the number of true negatives. 



-101- 



WO 03/083140 



PCT/LS03/08486 



Table 37. True Prediction Accuracy Results 
on Test Set using SVM and ANN algorithms 











SVM 




ANN 






ChiSq 


CFS 


T-stats 


SOM/DAV 


Wilkins' 


T-ALL 


True Accuracy 


100 


100 


100 


100 


100 




Sensitivity 


100 


100 


100 


100 


100 




Specificity 


100 


100 


100 


100 


100 


E2A-PBX1 


True Accuracy 


100 


100 


100 


100 


100 




Sensitivity 


100 


100 


100 


100 


100 




Specificity 


100 


100 


100 


100 


100 


TEL-AML1 


True Accuracy 


99 


99 


98 


97 


100 




Sensitivity 


100 


100 


100 


100 


100 




Specificity 


98 


98 


97 


97 


100 


BCR-ABL 


True Accuracy 


95 


97 


94 


97 


97 




Sensitivity 


50 


67 


33 


83 


83 




Specificity 


100 


100 


100 


98 


98 


MLL 


True Accuracy 


100 


98 


100 


97 


100 




Sensitivity 


100 


100 


100 


86 


100 




Specificity 


100 


98 


100 


100 


100 


H>50 


True Accuracy 


96 


96 


96 


95 


94 




Sensitivity 


100 


100 


100 


95 


100 




Specificity 


93 


93 


93 


93 


89 



Table 38. True Prediction Accuracy Results on Test Set using ft-NN 



/f-NN 







CliiSq 


CFS 


T-stats 


Wilkins' 


T-ALL 


True Accuracy 


100 


100 


100 


100 




Sensitivity 


100 


100 


100 


100 




Specificity 


100 


100 


100 


100 


E2A-PBX1 


True Accuracy 


100 


100 


100 


100 




Sensitivity 


100 


100 


100 


100 




Specificity 


100 


100 


100 


100 


TEL-AML1 


True Accuracy 


98 


98 


99 


100 




Sensitivity 


100 


96 


96 


100 




Specificity 


97 


98 


100 


100 


BCR-ABL 


True Accuracy 


94 


97 


95 


93 




Sensitivity 


33 


67 


50 


67 




Specificity 


100 


100 


100 


96 


MLL 


True Accuracy 


100 


98 


95 


100 




Sensitivity 


100 


S3 


100 


100 




Specificity 


100 


100 


94 


100 


H>50 


True Accuracy 


98 


96 


94 


98 




Sensitivity 


100 


100 


95 


100 




Specificity 


96 


93 


93 


96 
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Table 39. True Prediction Accuracy Results on Test Set using PCL 









PCL 






ChiSq 


CFS 


T-ALL 


Trae Accuracy 


100 


100 




Sensitivity 


100 


100 




Specificity 


100 


100 


E2A-PBX1 


Trae Accuracy 


ND 


100 




Sensitivity 


ND 


100 




Specificity 


ND 


100 


TEL-AMLl 


Trae Accuracy 


99 


ND 




Sensitivity 


96 


ND 




Specificity 


100 


ND 


BCR-ABL 


Trae Accuracy 


97 


ND 




Sensitivity 


67 


ND 




Specificity 


100 


ND 


MLL 


True Accuracy 


100 


ND 




Sensitivity 


100 


ND 




Specificity 


100 


ND 


H>50 


Trae Accuracy 


98 


ND 




Sensitivity 


100 


ND 




Specificity 


96 


ND 



The assignment of a leukemic sample to a specific biologic subgroup is more 
accurately reflected by its gene expression profile than by the presence or absence of a 
5 specific genetic lesion. For example, four patients that had expression profiles 
classified as TEL-AMLl, despite lacking a TEL-AMLl chimeric message by the 
reverse transcriptase polymerase chain reaction (RT-PCR) were found to have an 
alteration in TEL, suggesting a common underlying biology. Thus, from a technical 
viewpoint, gene expression profiling provides a viable alternative to standard 
10 diagnostic approaches. 

G. Absence of correlation of expression data for genetic subtypes with stage of B- 
cell differentiation 

The expression profiles of the different risk groups of B-cell leukemias do 
15 notcorrespond to markers of different stages of B-cell differentiation,. The first issue 
is defining the stage of B-cell differentiation. The defined stages of BM derived B- 
cells relevant to pediatric ALL are outlined below in Table 40, along with their 
frequency in pediatric ALL (Campana and Behm (2000)/ Immunologic Methods, 
243:59-75). Three stages of differentiation are defined by a limited number of 
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markers. In Table 41 below, the distribution of the leukemia cases into these B-cell 
differentiation stages is shown. As can be seen, none of the genetic subtypes is 
specifically associated with one of these three stages of differentiation. Thus, this 
simple analysis clearly shows that the majority of the chromosomal translocation 
subgroups in pediatric ALL do not correspond to a specific stage of B-cell. 
differentiation. This is a well-known fact in the field of pediatric ALL and differs 
from the relationship typically seen between chromosomal translocations and other 
genetic lesions, and the stage of differentiation seen in B-cell lymphomas. 

Table 40. Immunophenotyping of acute lymphoblastic leukemias 3 



Subtype 




Leukocyte antigen expression 


Frequency 






(% of cases positive) 


(%) 




CD19 


CD22 clgu slgu. slgKorX 




Early Pre-B 


100 


>95 0 0 0 


60-65 


Pre-B 


100 


100 100 0 0 


20-25 


Transitional 


100 


100 100 100 0 


1-3 



Abbreviations: clg (i, cytoplasmic immunoglobulin u chain; slg u, surface immunoglobulin u chain; 
slg k or X, surface immunoglobulin k or A, chains 

a D.Campana and F.G.Behm, "Immunophenotyping of leukemia", Journal of Immunological Methods 
243: 59-75, 2000. 





EARLY PRE-B 


PRE-B 


TRANSITIONAL 
PRE B 


E2A 


0 


17 


6 


TEL 


55 


23 


0 


BCR 


11 


3 


0 


MLL 


12 


6 


1 


Hyperdip>50 


49 


9 


5 


Novel 


8 


4 


1 


Total 


172 


77 


24 



"For this analysis, samples with other immunophenotypes (NOS or mature B-cell) were not included 

The next goal was to determine whether a set of genes that could accurately 
identify subjectss by their stage of differentiation, regardless of leukemai risk group. 
To accomplish this, cases were assigned into one of three classes, early pre-B, pre-B, 
or transitional pre-B based on their immunophenotype. The top 50 genes that 
distinguished each group from the other two groups were selected using the Wilkins' 
metric. These genes were then used in an ANN analysis to assess their performance 
in correctly classifying the 273 diagnostic B-lineage ALL samples, for which a stage 
of differentiation could be determined, through a process of cross validation. The 
results of this analysis are included below. 
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Table 42. Accuracy Results for immunophenotype discrimination using 

Wilkins' metric and ANN algorithm 

Accuracy Sensitivity Specificity 
Early Pre-B a 78.39% 85.47% 66.34% 

Pre-B b 71.79% 38.96% 84.69% 

Transitional Pre-B c 91.24% 33.33% 96.79% 

a Cells with CD19+, CD22+, cytoplasmic Igi>, surface Ig|i- immunophenotype 
"Cells with CD19+, CD22+, cytoplasmic Ign+, surface lg\i- immunophenotype 
c Cells with CD19+, CD22+, cytoplasmic Ign+, surface Ign+ immunophenotype 

The selected genes perform rather poorly in correctly assigning cases to specific B- 
cell differentiation stages, with accuracies well below those achieved for prediction of 
the genetic subgroups. When these genes are used in a two-dimensional hierarchical 
clustering algorithm they failed to cluster cases by immunophenotype, but instead, 
resulted in the loose clustering of some of the genetic subgroups, including E2A- 
PBX1, TEL-AML1, BCR-ABL, MLL, and hyperdiploid >50. The analysis was 
repeated using genes selected by DAV and again, no clustering of the 
immunophenotypically-defined stages was observed. Thus, it was not possible to 
identify expression profiles that can accurately identify the unmunophenotypically- 
defined differentiation stages of pediatric B-cell ALL. Moreover, the expression 
profiles that were defined for the genetic subtypes are not profiles that correspond to 
specific stages of B-cell differentiation. Although some of the genes that define 
specific genetic subtypes can be associated with a particular stage of B-cell 
differentiation, the majority of the discrirrunating genes show no correlation with 
differentiation. 

H. Results for relapse prediction 

In the prediction of whether a patient would go into continuous complete 
remission or would relapse, a subtype-specific approach was adopted. An individual 
classifier was constructed for each subtype of ALL. Given a sample, the subtype was 
first predicted, and then the corresponding subtype-specific prognostic classifier was 
invoked to predict whether the patient would relapse. This subtype-specific approach 
was required because an expression profile predictive of relapse for the entire group 
could not be defined. 

In the construction of the type-specific classifiers, genes were selected by CFS 
unless this algorithm returned >20 genes, in which case the top 20 ranked genes by T- 
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statistics were used. When the T-statistics method was used, the selection of how 
many among the top 20 T-statistics genes were to be used was made by performing 
cross validation experiments— that is, the top n genes for n = 1..20 were picked the n 
that gave the best cross validation results was selected. The cross validation results 
for the optimal choice of genes are summarized in Table 43 below. The genes that 
were chosen for use in subtype-specific relapse predictions are summarized in Table 
44. 

Table 43. Results of relapse prediction on indicated subgroups 

P value by 

Relapse CCR # genes metric Accuracy permutation test 



T-ALL 8 26 7 t-stats 97 0.034 

H>50 5 43 13 t-stats 100 0.018 

TEL-AMU 3 56 7 CFS 100 0.145 

MLL 5 7 4 t-stats 100 0.104 

Others 4 56 20 t-stats 98.3 0.079 



Table 44. Genes selected by T-s 
Gene Name 



Human TBXAS1 gene for thromboxane synthase 

Homo sapiens mRNA for 41-kDa 
phosphoribosylpyrophosphate synthetase- 
associated protein 

Human DNA sequence from PAG 370M22 
Human spinal muscular atrophy gene 
Human cell surface glycoprotein CD44 
Human mRNA for KIAA0056 gene 
Human BTK region clone ftp-3 mRNA 



for relapse (T-ALL) 



GeneSymbol Reference Above/ 

Number Below 
Mean 

TBXAS1 D34625 Above 

AB007851 Above 

Z82206 Above 

SMA5 X83301 Above 

CD44 L05424 Above 

KIAA0056 D29954 Above 

U01923 Above 



Table 45. Genes Selected by T statistics/CFS for relapse Hyperdiploid > 50 



Affymetrix Gene Name GeneSymbol 
number 

37721_at deoxyhypusine synthase DHPS 

38721_at KIAA1536 protein KIAA1536 

40120_at hydroxyacyl glutathione HAGH 
hydrolase 

41386_i_at KIAA0346 protein KIAA0346 



Reference Above/ 
Number Below 
Mean 



U79262 
W72733 
X90999 



Above 
Above 
Above 

Above 
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5 


38677_at 


stress 70 protein chaperone 


STCH 


U04735 


Above 






microsome-associated 60fcD 








6 


37620_at 


Human TFIID subunits TAF20 




U57693 


Above 






and TAF15 mRNA, complete 








7 


34703_f_at 


EST 




AA151971 


Above 


8 


38355_at 


DEAD/H Asp-Glu-Ala-Asp/His 


DBY 


AF000984 


Above 






box polypeptide Y chromosome 








9 


41214_at 


ribosomal protein S4 Y-lmked 


RPS4Y 


M58459 


Above 


10 


34530 at 


Homo sapiens cDNA FLJ22448 




W73822 


Above 






fis clone HRC09 541 








11 


603_at 


nuclear receptor subfamily 2 


NR2C1 


M29960 


Above 






group C member 1 








12 


32697_at 


inositol myo 1 or 4 


IMPA1 


AF042729 


Above 






monophosphate 1 






Above 


13 


41129_at 


jsxfV/vuujj proieui 


KIAA0033 


D26067 


14 


33333_at 


KIAA0403 protein 




AB007863 


Above 


15 


37078_at 


CD3Z antigen zeta polypeptide 


CD3Z 


104132 


A ove 














16 


38148_at 


cryptochrome 1 photolyase-like 


CRY1 


D83702 


Above 


17 


39150_at 


ring finger protein 1 1 


RNF11 


U69559 


Above 


18 


33869_at 


DKFZp586N1323 from clone 




AL080218 


Above 






DKFZp586N1323 








19 


41447_at 


KIAA0990 protein 


iaAA0990 


AB023207 


Above 


20 


39369_at 


KIAA093 5 protein 


KIAA0935 


AB023152 


Above 



Table 46: Genes selected by T-statistics/CFS for relapse (TEL-AML1I) 



1 35797_at Human interleukin-13 gene 

2 37524_at Human death-associated protein kinase 

3 34243 _i_at Human l(3)mbt protein homo log mRNA 

4 41398_at Homo sapiens mRNA. CDNA 

DKFZp564A186 

5 35 195_at H. sapiens mRNA for phosphate cyclase 

6 32393_s_at Homo sapiens cDNA 

7 3 1 909_at Homo sapiens mRNA for KIAA0754 

protein 



Gene 
Symbol 



IL-13Ra 
DRAK2 



Y10659 
AB011421 
U89358 
AL049305 

Y11651 
W27466 
AB018297 



Above/ 
Below 
Mean 

Above 
Above 
Above 
Above 

Above 
Above 
Above 
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Table 47: Genes selected by T-statistics/CFS for relapse (MLL) 

Affymetrix Gene Name Gene Reference Above/ 

number Symbol number Below 

Mean 

1 294_s_at Protein Kinase Pitslre, Alpha, Alt. Splice 1- Below 

Feb 

2 38226„at 23hl 1 Homo sapiens cDNA W27152 Below 

3 1398_g_at Human protein kinase (MLK-3)mRNA HUMMLK3A L32976 Above 

4 409_at Human mRNA for 14.3.3 protein, a protein X56468 Below 

kinase regulator 



Table 48: Genes selected by T-statistics/CFS for relapse (Others) 



GeneSymbol 



Reference 
number 



1 33782_r_at nn82f03 .si Homo sapiens cDNA, 3 end 

/clone=IMAGE-1090397 

2 33338_at Human transcription factor ISGF-3 mRNA 

3 40242_at Human (clone N5-4) protein p84 mRNA 

4 37018_at - qd05c04.xl Homo sapiens cDNA, 3 end 

/clone=IMAGE-1722822 

5 38337_at Homo sapiens zinc finger protein mRNA 

6 41464_at Human mRNA for KIAA0339 gene KIAA0339 

7 38064_at H.sapiens lrp mRNA LRP 

8 33 173_g_at yc89b05.rl Homo sapiens cDNA, 5 end 

/clone=IMAGE-23231 

9 33365_at Homo sapiens mRNA for KIAA0945 KIAA0945 

protein 

10 39367_at m38e08.sl Homo sapiens cDNA, 3 end 

/clone=fMAGE-979142 

1 1 41 108_at Homo sapiens mRNA for putative GTP- PGPL 



13 40359_at 

14 32792_at 



15 34726_at 

16 40299_at 



Homo sapiens heterochromatin protein p25 P25beta 
mRNA 

Human DNA-binding protein (HRC1) HRC1 
mRNA 

Human DNA sequence from clone 465N24 
on chromosome lp35. 1-36.13. Contains 
two novel genes, ESTs, GSSs and CpG 
islands 

Human voltage-gated calcium channel beta 
subunitmRNA 

Homo sapiens G-protein coupled receptor 
RE2mRNA, 



M97936 
L36529 
AI189287 

U62392 
AB002337 
X79882 
T75292 

AB023162 
AA522537 



M91083 
AL031432 



U07139 
AF091890 



Above/ 
Below 
Mean 

Above 



Above 
Above 
Above 

Above 
Above 
Above 
Below 

Above 
Above 



Above 
Above 



Above 
Above 
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17 40704_at H.sapiens mRNA for phosphatidylinositol 

3 -kinase 

18 38568_at Homo sapiens p53 binding protein mRNA 

19 32038_s_at wi30cl2.xl Homo sapiens cDNA, 3 end 

/clone=IMAGE-2391766 

20 3 96 1 3_at H.sapiens HUMM9 mRNA 

I. Permutations test results 
As the number of relapse samples were small, in addition to the usual cross validation 
5 experiments, 1 000 permutation experiments were performed for each subtype-specific 
relapse study. In each permutation experiment, the samples were re-partitioned in a 
manner that preserved class size by randomly swapping the class labels ("relapse" or 
"continuous complete remission"). The same metric was then employed to pick the 
same number of genes as in the original partitioning of the samples given by the 

1 0 original class labels. SVM was then used to obtain a prediction accuracy by cross 
validation for this random partition using these freshly selected genes. The 
percentage of these 1000 permutation experiments was taken as a p-value that gave an 
indication on how many random partitions of the original samples could achieve the 
same accuracy as the original samples. The results of these permutation experiments 

1 5 are summarized in the last column of Table 43 above. These results show that the 
high accuracy obtained on the predictability of relapse in T-lineage ALL, 
Hyperdiploid>50, and others are unlikely to be a random event. The higher p-values 
obtained for the subtypes of TEL-AML1 and MLL are probably due to the small 
number of relapse samples available for analysis. 

20 

Table 49. Permutation test results for predictors of T-ALL relapse 



Rank 


Affymetrix 
number 


t-statistic 
value 


Perm 1% 


Perm 5% 


neighbors 


1 


33777_at 


7.8337 


7.3774 


5.4783 


6 


2 


41853_at 


6.1727 


6.5948 


4.8117 


16 


3 


38866_at 


5.9890 


6.0293 


4.5611 


12 


4 


41643_at 


5.6106 


5.6815 


4.3877 


12 


5 


1126_s_at 


5.4777 


5.5162 


4.2375 


11 


6 


41862_at 


5.3734 


5.3759 


4.1208 


11 


7 


41131_f_at 


4.9134 


5.2280 


4.0295 


17 
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Z29090 Above 

U82939 Above 

AI739308 Above 

X74837 Above 
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Table 50. Permutation test results for predictors of Hyperdiploid > 50 relapse 





Affymetrix 


t-statistics 








Rank 


number 


value 


Perml% 


Perm 5% 


neighbors 


1 


3772 l_at 


8.7160 


12.7358 


9.9506 


75 


2 


38721_at 


8.4162 


10.7256 


8.8438 


59 


3 


40120_at 


7.2736 


9.9837 


8.0383 


73 


4 


41386_i_at 


6.3436 


9.0552 


7.5579 


88 


5 


38677_at 


6.2698 


8.8633 


7.2466 


88 


6 


37620_at 


6.2174 


8.4154 


6.9604 


82 


7 


34703_f_at 


6.0770 


8.0982 


6.8835 


83 


8 


38355_at 


5.5120 


7.8657 


6.7434 


92 


9 


41214_at 


5.4262 


7.6583 


6.6094 


90 


10 


34530_at 


5.4013 


7.5991 


6.5109 


87 


11 


603_at 


5.3142 


7.5903 


6.4409 


87 


12 


32697_at 


5.1785 


7.5146 


6.3265 


90 


13 


41129_at 


5.1450 


7.3939 


6.2121 


88 


14 


33333_at 


5.1061 


7.2601 


6.1389 


87 


15 


37078_at 


5.0738 


7.1484 


6.0308 


86 


16 


38148_at 


4.9256 


6.9688 


5.9230 


93 


17 


39150_at 


4.9061 


6.9273 


5.9015 


93 


18 


33869_at 


4.8256 


6.8900 


5.8367 


93 


19 


41447_at 


4.7919 


6.8135 


5.7621 


93 


20 


39369_at 


4.7790 


6.7731 


5.7391 


92 



Individually, the discriminating genes for relapse in T-ALL are significant at either 
the 1% or 5% level, while those for hyperdiploid >50 fall at approximaltely the 7% 
level. 

5 

Table 51. Results of relapse prediction on indicated subgroups 





Relapse 


CCR 


# genes 


metric 


Accurac 

y 


P value by 
permutation test 


T-ALL 


8 


26 


7 


t-stats 


97 


0.034 


H>50 


5 


43 


13 


t-stats 


100 


0.018 


TEL-AML1 


3 


56 


7 


CFS 


100 


0.145 


MIL 


5 


7 


4 


t-stats 


100 


0.104 


Others 


4 


56 


20 




98.3 


0.079 



As the number of relapse samples were small, in addition to the usual cross 
validation experiments, 1000 permutation experiments were also performed for each 
1 0 subtype-specific relapse study. In each permutation experiment, the samples were re- 
partitioned in a manner that preserved class size by randomly swapping the class 
labels ("relapse" or "continuous complete remission"). The same metric was 
employed to pick the same number of genes as in the original partitioning of the 
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samples given by the original class labels. SVM was then used to obtain a prediction 
accuracy by cross validation for this random partition using these freshly selected 
genes. The percentage of these 1000 permutation experiments was taken as a p-value 
that gave an indication on how many random partitions of the original samples could 
5 achieve the same accuracy as the original samples. The results of these permutation 
experiments are summarized in the last column of Table 5 1 above. These results show 
that the high accuracy obtained on the predictability of relapse in T-lineage ALL, 
Hyperdiploid>50, and others are unlikely to be a random event. The p-values for the 
subtypes of TEL-AML1 and MIL are weaker than the other subtypes. However, in the 
1 0 case of TEL-AML1 the number of relapse samples were exceedingly small (3) and in 
the case of MLL the number of relapse and non-relapse samples were both very small. 

J. Results for secondary AML prediction 

For the secondary AML prediction ,the same subtype-specific approach was 

15 adopted as described earlier in relapse prediction. This time only the TEL-AML1 

subtype had sufficient number of samples for a secondary AML prediction model to 
be developed. For this model, the MIT score (Golub et al. (1999) Science 286:53 1- 
37, herein incorporated by reference) was used to select genes and SVM to perform 
classification using these genes. The MIT score of a gene is defined as T = \\x\ - 

20 |~i 2 |/(cti + a 2 ), where is the mean expression of that gene in the i th class and a, is the 
standard deviation of that gene in the i th class. This formula assigns higher value to a 
gene that has larger mean difference between two classes and has smaller variance 
within both classes. The 20 genes with the highest MIT scores in TEL-AML1 patients 
that went into continuous complete remission versus those TEL-AML1 samples that 

25 developed secondary AML are listed in Table 52 below. 1 00% accuracy for 

secondary AML prediction accuracy was achieved on TEL-AML1 specific subtype 
samples using these 20 genes. A permutation test was also performed in the same 
manner as described earlier in the subtype-specific relapse prediction, and obtained a 
p-value of 0.031 was obtained, demonstrating that the predictability of the 

30 development of secondary AML in TEL-AML1 -specific patients was unlikely to be a 
random event. 
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Table 52. Genes selected by MIT score for secondary AML 



Affymetrix 
Number 



Gene 
Symbol 



Reference 
Number 



TEL-AML1 

1 34890_at ATPase H transporting lysosomal vacuolar ATP6A1 
proton pump alpha polypeptide 70kD 



Above/ 
Below 
Mean 



40925_at hypothetical protein FLJ10803 FLJ10803 AA554945 Above 

1719_at mutSE. colihomolog3 MSH3 U61981 Above 

32877_i_at EST IMAGE:954213 AA524802 Above 



32650_at neuronal protein NP25 Z78388 Above 

33173_g_at hypothetical protein FLJ1 0849 FLJ10849 T75292 Above 

32545_r_at RSU-l/RSP-1 RSU-1 L12535 Above 

34889_at ATPase H transporting lysosomal vacuolar ATP6A1 AA056747 Above 
proton pump alpha polypeptide 70kD 
1 



9 35180_at cDNA DKFZp586F 13 23 from clone 

DKFZp586F1323 

10 34274_at KIAA1 116 protein 

1 1 3 5727_at hypotlietical protein FLJ205 17 

12 1627_at tyrosine kinase (GB:Z25437) 



KIAA1116 
FLJ20517 



13 1461_at nuclear factor of kappa light polypeptide NFKBIA 

gene enhancer in B-cells inhibitor alpha 

14 36023_at lacrimal proline rich protein LPRP 

15 39167_r_at serine or cysteine proteinase inhibitor SERPINH2 



16 39969_at 

17 38692_at 



19 33234_at 

20 34739_at 



clade H heat shock protein 47 member 2 
H4 histone family member G H4FG 
NGFI-A binding protein 1 ERG1 binding NAB 1 
protein 1 

polymerase RNA II DNA directed POLR2C 
polypeptide C 33kD 



RBPl-like protein 
hypothetical protein FLJ20275 



LOC51742 
FLJ20275 



AB029039 
AI249721 
HG2715- 
HT2811 



AI864120 
D83174 



AA255502 
AF045451 



AA887480 
W26023 



Above 
Above 
Above 



Above 
Above 



Above 
Above 



Above 
Above 
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Table 53. Permutation test results for secondary AML 



Rank 


Affymetrix 


t-statistics 


Perm 1°/ 


Perm 5% 


Perm 


neighbors 


1 


34890_at 


1.2204 


2.7933 


2.2138 


1.4712 


822 


2 


40925_at 


1.0712 


2.0006 


1.7607 


1.2884 


859 


3 


1719_at 


1.0599 


1.8536 


1.6272 


1.1894 


767 


4 


32877_i_at 


1.0364 


1.7125 


1.5218 


1.1200 


715 


5 


32650_at 


1.0217 


1.6580 


1.4584 


1.0776 


646 


6 


33173_g_at 


1.0126 


1.5868 


1.4132 


1.0416 


595 


7 


32545_r_at 


1.0097 


1.5536 


1.3630 


1.0223 


536 


8 


34889_at 


0.9959 


1.5164 


1.3241 


1.0009 


512 


9 


35180_at 


0.9854 


1.4838 


1.2938 


0.9777 


477 


10 


34274_at 


0.9420 


1.4759 


1.2721 


0.9600 


550 


11 


35727_at 


0.8493 


1.4482 


1.2507 


0.9415 


809 


12 


1627_at 


0.8471 


1.4207 


1.2398 


0.9254 


782 


13 


1461_at 


0.8312 


1.4012 


1.2260 


0.9114 


801 


14 


36023_at 


0.8177 


1.3551 


1.2012 


0.8995 


813 


15 


39167_r_at 


0.8136 


1.3462 


1.1806 


0.8894 


790 


16 


39969_at 


0.8122 


1.3395 


1.1702 


0.8785 


759 


17 


38692_at 


0.8109 


1.3333 


1.1565 


0.8696 


729 


18 


1594_at 


0.8103 


1.3142 


1.1503 


0.8626 


696 



Table 54: Additional Genes selected by 


T statistics for BCR-ABL risk group 


Gene symbol 


Accession Number 


TUBA1 


HG2259-HT2348 


TUBA1 


X06956 


CRADD 


U84388 


SLC2A5 


M55531 


PHYH 


AF023462 


ZFPL1 


AF001891 


CD34 


S53911 


KIAA0015 


D13640 


CLECSF2 


X96719 


CD34 


M81945 


GAB1 


U43885 


E2F5 


U31556 


CLTB 


M20470 


ENG 


X72012 


LOC55884 


AF038187 


TNFRSF1A 


M58286 


TMSNB 


D82345 


SNL 


U03057 
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(KIAA0990 


AB023207 


(mapia 


W2663 1 


|mypt2 


AB007972 


JFBO 


J03909 


ERPROT213-21 


U94836 


DKFZP586A052 
2 


AL050159 


! 0( '51109 


AA126515 




W29087 


TSTA3 


U58766 


TNFRSF1B 


AI8 13532 


GSN 


X04412 


KIAA05S2 


AI761647 


STATE 


AF037989 




AL049313 


ITGA4 


XI 6983 


FLJ20500 


AA522530 


SDR 1 


AF061741 


ARHGEF4 


AB029035 


C180RF1 


AF009426 


MAPK14 


II 19775 


[TILL 


AF063002 


GATA3 


X58072 


KIAA0076 


D38548 


|KCNN1 


U69883 


POM121L1 


D87002 


IFI30 


J03909 


ABL1 


X16416 


NELL2 


D83018 


MEST 


D78611 


S100A4 


W72186 


D12S2489E 


AJ001687 


ATP2B4 


W28589 


CTGF 


X78947 


RGS1 


S 5 9049 


CDK9 


X80230 




AI524873 


STIM1 


U52426 


VEGFB 


INS 1 1 1 


PPP2R2A 


VE64929 


CASP2 


U13022 


SPS 


U34044 


HRK 


D83699 


KIAA0870 


AB020677 


ABL 


U07563 


PKLA 


S76965 


FLJ12474 


AA306076 
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CD97 


JX94630 


TICK 


|M16591 


FYN 


M14333 


KIR2DL3 ~ 


1AC006293 


DMPK 


IL08835 


N33 


IU42360 


FLJ13949 


AL041879 


PRKCZ 


Z15108 


IL17R 


IU58917 


FMR2 


1048436 


CNTSR 


M10051 


AHNAK 


IM80899 


KIAA0878 


IAB020685 


CD86 


I ( 14343 


IU82303 


KIAA1043 


IAL033538 


N33 


U42349 


SYN47 


|Y17829 


ITPR1 


1D26070 


SFRS9 


AL 02 1546 


EPOR 


|M60459 


GAC1 


AF030435 


CAMK4 


to30742 


KIAA0084 


|D42043 


LAT 


AJ223280 


XBP1 


IZ93930 


FLT3LG 


[U03858 


TESK1 


ID50863 


A 1 070633 


KIAA0681 


JU89358 


FUT8 


IY17979 



T Table 55: Additional Genes selected 
bv statistics for E2A-PBX1 Risk Group 


Gene symbol 


Accession Number 


PBX1 


M86546 




AL049381 


FAT 


X87241 


BLK 


S76617 


TRF4 U52682 


GS3955 


D87119 


KIAA0802 


AB018345 


SCHIP-1 


AF070614 


SNL 


U03057 


KIAA0655 


AB014555 


GS3955 


D87119 
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IGFBP7 


L19182 


CDKN1A 


U03106 


CSF2RB 


II04668 " 


STATE 


AF037989 


KTAA1029 


AB028952 


KIAA0247 


D87434 




AL049397 


NP 


X00737 


jTM4SF2" 


I, f 0373 


ALOX5 


J03600 


jLRMP 


U10485 


[PTPN2 


AI828880 


ALOX5AP 


A1806222 


AEBP1 


AF053944 


TGFBR2 


D50683 


ODC1 


M33764 


|NID2 


D86425 


ODC1 


XI 6277 


CBX1 


U35451 


CSF3R 


M59820 


KIAA0172 


D79994 


ILIB 


M15330 


KIAA0922 


AB023139 


LOC51097 


AA005018 ! 


TUBA1 


X06956 


ITGA6 


S66213 


NFKBIL1 


Y14768 


ADPRT 


J03473 


ADPRT 


J03473 


CSF3R " 


M59818 


J I \B 


U09303 


CD9 ' 


M38690 


CDKN2D 


U40343 


KIAA0442 


AB007902 


PRKCZ 


Z15108 




AF055029 


RECK 


D50406 


GOLGA3 


D63997 


IZAP70 


L05148 


FLU 


M98833 


Ilaspi 


X82456 




AJ001381 


TBXA2R 


D38081 


BHLHB2 


AB004066 


AD ARB 1 


U76421 


PTPN6 


X62055 
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SCGK 


\F020044 


PRKACB 


'.V134181 


KCNN4 


|AF022797 


KCNN1 


IU69883 


MAPKAPK2 


IU12779 


PIN 


IAI540958 


TOP2B 


~|X68"060 


GATA2 


M68891 


IL1B 


1X04500 


PDE3B 


1U38178 


DGKD 


ID73409 


KIAA0993 


(AB023210 


AD AMI 0 


AF009615 


IGLL1 


IM27749 


PDLIM1 


IU90878 


PRKAR1A 


IM33336 


CD34 


IS53911 


GI.A 


1U78027 
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BAZ1B 


<\F072810 


EFNA1 


U57730 


FADS3 


&C004770 


FLT3 


U02687 


LOC57228 


AF091087 


BCL6 




BMP2 




CD22 


X59350 


KIAA0429 


ArSUU / ooy 


DKFZP434C171 


at nsn i f>Q 


CTBP2 


AF016507 




M11810 


SIAT9 


AB018356 


ICYBB 


X04011 


AKR1B1 


X15414 


NFKBIL1 


Y14768 


|uBE2Vl 


U49278 


|D0C-1R 


_\F0S9814 _ j 


BUB3 


AF047473 " J 


IL7R 


M29696 j 


ACK1 


L13738 


ENIGMA 


L35240 


KIAA1071 


AB028994 


jGL 


AI932613 i 


MN1 


X82209 _ .j 


K1AA0823 


AB020630 


NFKB1 


M58603 


CD24 


L33930 


YWHAQ 


X56468 


VDAC1 


L06132 


P85SPR 


D63476 


SYNGR1 


AL022326 


NDR 


Z35102 


J MJ 


AL021938 


PRSC1 


D55696 ~ 


MRC1 


M93221 




AI184710 


CR1P1 


AI017574 


KIAA0056 


D29954 




AF039397 




U79265 


SLAM 


U33017 


LYL1 


AC005546 


KIAA0620 


AB014520 


VDAC1P 


AJ002428 


SRP9 


AF070649 
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PRDX1 


X67951 


SLC9A3R1 


AF015926 


CD72 


M54992 


ECM1 


U68186 


PPP2R5A 


L42373 


HDGF 


D16431 


MERTK 


U08023 






L02326 




CD34 


M81945 


IL17R 


U58917 


ARL7 


AB016811 




P4HA2 


U90441 


BZRP 


M36035 


F13A1 


M14539 




JKRAS2 


M54968 




IBS69 


X86098 




SORP150 


U65785 






D28915 




AL049409 


SH2D1A 


AL023657 


LY6E 


U66711 




FACVL1 


D88308 




EPB42 


M60298 






AL049471 




(bmiI ~ " 


L13689 




jKCNJ13 


N36926 




!N33 


U42349 




IVIL2 


X51521 




joCNG2 


U47414 




[cFsORFl 


AF009425 




pNUMAl " 


Z11584 




IDBNI 


U00802 


q 


iFLT3 


U02687 




felAA0854 


|AB02066l" 




MGC4175 ' 


IAI656421 




KIAA1012 


Kb02322sT 




CIRBP 


P78134 




ST5 


U15131 




Ikiaaoooi 


D13626 


ICCR1 


D10925 


'CD 19 


M28170 


SNRPE 


AA733050 


CR2 


[M26004 


HEXA 


M16424 


IFIT4 


AF026939 




W26667 
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EPOR 


M60459 


TMSNB 


D82345 


GCLM 


L35546 


H41 


H15872 


TUBB2 


HG1980-HT2023 


TNFAEP2 


M92357 


GAB1 


U43885 


fpTPRKT" 


L77886 


BCL7A 


X89984 



Table 56: Additional Genes selected by 

T statistics for Hyperdiploid >50 
Risk Group 



Gene symbol 


Accession Number 


SH3BP5 


AB005047 


FLT3 


U02687 


MX1 


M33882 


NPY 


A] 198311 


SOD1 


X02317 


PTPRK 


1.77886 


IL1B 


X04500 


CD9 


M38690 


PLT3 


U02~687~ 


PGK1 


V00572 


EFNB1 




FOS 


K00650 


1L1B 


M15330 


MRC1 


M93221 


HMG14 


* J02621 


SNRP70 


X06815 


PDLEVI1 


U90878 


ALOX5 


J03600 


RAG2 


M94633 


CALM1 


U12022 


KIAA1013 


AB023230 


NDUFA1 


N47307 


FOS 


V01512 


DXS1357E 


X81109 


ICSBP1 


M91196 


l ETS2 


J04102 


PCDH9 


AT524125 < 


LILRA2 


AF025531 
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PSAP 


J03077 


SCHIP-1 


AF070614 


CCND2 


D13639 


KCNN1 


U69883 


ALTE 


AB018328 


IGFBP4 


U20982 


M9 


AB019392 


SCML2 


Y18004 


LOC51632 " 


AI557497 


UBE2G2 


AF032456 


STATI2 


AF037989 


ATRX 


U72936 


APT6M8-9 ' 


AL049929 


PTPRE 


|X54134 


GILZ 


AI635895 


PECAM1 


AA1 00961 


ARHGEF4 


AB029035 


ECM1 


U68186 




Table 57: Additional Genes selected by 


T statistics for the MLL Risk Group 


Gene symbol 


Accession Number 


EPOR 


M60459 


CD44 


L05424 


PRKCII 


M55284 


MADH1 


U59423 


KLF1 


U65404 


MME 


J03779 


PTPRK 


L77886 


IL1B 


X04500 


lYESl 


M15990 


ARPC2 


U50523 


IGFBP4 


M62403 


ITPR3 


U01062 




M13929 


EFNB1 


U09303 


FHIT 


U46922 


NME2 


X58965 


CCND2 


X68452 


MPB1 


M55914 
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CDH2 


M34064 


TGFBP7 


L19182 


ALOX5 


J03600 


PTGDR 


U31099 


PLXNC1 


AF030339 


EIF3S2 


U39067 


BLVRA 


X93086 


HSPC022 


W68830 




S67247 


MYLK 


U4S959 


SLC6A11 


S75989 




X67098 


SERPINB1 


M93056 


LGALS1 


AI535946 


HRK 


D83699 




AL049313 


HBS1L 


AB028961 


KIAA0437 


AB022660 


GDI2 


Y13286 


TTGA4 


X16983 


EEF1B2 


X60489 


MD-1 


AB020499 


POU4F1 


X64624 


TST 


X59434 


PTPRF 


Y00815 


AJKfcLG.hr 4 




SCHIP-1 


ArU /Uol4 


ASMTL 


AAOOy lyy 


DDR1 


L20817 


N33 




CR2 


M26004 


AIINAK 


M80899 


SCGF 


AF020044 


|^___ 


U28389 


PSPHL 


AJ001612 


MADH1 


U59912 


ITPR3 


U01062 


DPEP1 


J05257 


AKAP12 


T JQ 1 f.C\l 

UoloU/ 


DBI 


AJD j /Z4U 


VTA \f\11& 

IviAAU / JO 


AR01 897Q 


MAL 


X76220 


S100A4 


W72186 


MDK 


X55110 


CRK _ 


D10656 



-122- 



WO 03/083140 



PCT/US03/08486 



CAPG J 
KCNH2 I 
KIAA1069 


M94345 

U04270 
AB028992 


DKFZP564L0862 


AL080091 


DGKD 
DEPP 


AB002296 
D73409 

~AB022718 j 
AL049957 


CD8B1 


X13444 


EFNB1 


U09303 




AI391564 


LDOC1 

EFNA1 


AB019527 


1V1 J / / JU 


ICD44 




iPTPRC 


Y UUWOZ 


pTPRC 


Y00638 


PTPRC 


Y00638 


Itfpi : 

ILtl-L 1 

jTSPAN-5 


M59499 


AF065389 


[bcli ia 

KIAA1011 


Q 

V\ J. 1 0 1 J 

AL080133 


FYB 


U93049 


DKFZp761F2014 


AA149431 


FGFR1 


X66945 


M.63589 


(PTPN6 | X62055 J 




Table 58: Additional Genes selected by j 
T statistics for the Novel Risk Group 


Gene symbol 


Accession Number 


CHST2 


AB014679 


CLTC 


D21260 


TUBA1 


X06956 


•gngh 


U31384 


PCDH9 


AI524125 


MDS019 


AA442560 


1X02 


M94633 


ITGA6 


X53586 


UBE2E3 


AB017644 


CD34 


S53911 


CD34 


M81945 


Ifgfri 


M34641 
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ECM1 


U68186 


MADH1 


U59423 


FUT7 


AB012668 


PROML1 


AF027208 


CSNK2A1 


M55265 


FLNB 


AF042166 


MADH1 


U59912 


LIG4 


X83441 


ZNF151 


Y09723 


CSF3R 


M59818 




AL080205 


STAU2 


AL079286 1 


AEBP I 


AF053944 


KIAA0320 


AB002318 


KIAA0746 


AB018289 


r PTPRM 


X58288 


IGFBP4 


M62403 


ZNF266 


AA868898 


PDLIM1 


U90878 


MTMR3 


AB002369 


TIMP1 


D11139 


TTC2 


W28595 


TM4SF2 


L10373 


PSA 


AA978353 


HTR4 


Y12505 


MMS19L 


AI'007151 




r AI391564 


TJP2 


L27476 


BMP2 


M22489 


ARL7 


AB016811 


TLR1 


AL050262 


SMC2L1 


AF092563 


TGFBR2 


! D50683 


TGFBR2 


D50683 


SPARC 


J03040 


CiPRK5 


L15388 


CDH2 


M34064 


KIAA0877 


AB020684 


ABLFM 


F D31883 


RNF3 


1 W25793 


CCBP2 


1 U 94888 ~ 


CHN2 


U07223 


ITGA4 


X16983 


IQGAP2 


U51903 


FLJ22531 


W80358 


PIK3CD 


U86453 
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|FXYD2 f H9488T 





W30677 


AiVli: Ud 


U29926" 




D78577 


TTTA A ("11 9^ 


D50915 




AC004770 


DKFZP434C171 


AL080169 


Ho i uuuyo 


AI885170 




M22489 


T TT RR4 


AF072099 




AB007889 


ij jsJt 1 Zj± j o o Kjyj j 


AL050289 




U92818 


ATIC 


" D82348 


MONDOA 1 


AB020674 


CNK1 


AF100153 


NGFR 


Ml 4764 


KIAA0540 


AB011112 


MYO10 


ABO 18342 


PIASX-BETA 


AF077954 


ACVR1 " 


Z22534 


ARHGEF10 


AB002292 




AF001601 


TST 


X59434 


SPTBN1 


M96803 




AA079018 


PRSC1 


D55696 


DKFZP434D174 


AL080150 




— — 


All 847 10 




X13444 





U 792 65 


T\KV7nlM F901 4 


AA 149431 


MEF2A 


U49020 


JAG2 


AF029778 


ZNF143 


AF071771 


CASP1 


U13697 


HAP1 


AF040723 


FABGL 


F D82061 " 


ALDH1 " 


I K03000 


RAD9 


U53174 




AL109722 


CDC27 


T~ AA166687 


B4GALT1 


D29805 



-125- 



WO 03/083140 



PCT/LS03/08486 



|ptprm 


X58288 


LAHR 


LI 9872 


|N33 


U42349 


IL12RB2 


U64198 I 


MTR 


U73338 


KIAA0697 


AB014597 


CSNK2B 


M30448 




U15590 




W28612 


HSU79253 


AF052186 


RBBP1 


S57153 


S100A11 


D38583 


TCF12 


M80627 




AI971169 


EEF1E1 


N32257 


SAP18 


AW021542 


PVRL1 


AF060231 




M13929 ~~ 


MKP-L 


AF038844 




W26667 


CD79B 


M89957 


KIAA0437 


AB022660 




AF070633 


GCLM 


L35546 


EDG6 


AJ000479 


MAL 


X76220 




Table 59: Additional Genes selected by 


T statistics for the T-ALL Risk Group 


Gene symbol 


Accession Number 


SLP65 


AF068180 


CD3D 


AA919102 


ISH2D1A 


AL023657 


CD79B 


M89957 


CD3E 


M23323 


CTGF 


X78947 


"PFTKl 


AB020641 


TRB 


X00437 


CD24 


L33930 


ICD22 


X52785 


TOP2B 


X68060 


CD22 


X59350 


TCL1A 


X82240 


BRAG 


AB011170 


CD79A 


U05259 


SCHIP-1 


AF070614 
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MAL 


X76220 


HLA-DQB1 


Ml 6276 


PDE4B 


L20971 


HLA-DQB1 


M60028 


CD19 


M28170 


KIAA0959 


AB023176 


LILRA2 


AF025531 


FIPN18 


X79568 


MEF2C 


L08895 


PTP4A2 


U14603 


TMPY 


AI198311 


GAB1 


U43885 


Ick 


U23852 


TCP 7 
TERF2 


X59871 
X93512" 


ITM2A 


AL021786 


MEF2C 


S57212 


T 1 C9A3E 1 


AF015926 


liNG 


X72012 


DEPP 


AB022718 


[LIB 


X04500 


jiLIB 


M15330 


[ecmi 


U68186 


HLA-DMA 


X62744 


CRMP1 


D78012 


WFS1 


AF084481 1 


PRKCQ 
GNG7 


L01087 


ABO 10414 




X58398 


CDKN1A 


U03106 


CD9 


M38690 


PTK2 


L13616 


TRB 


M12886 


IE 135 


L78833 


NUCB2 


X76732 


KIAA0942 


AB023159 


VATI 


U18009 


ARL7 


AB016811 


USP20 


AB023220 


PLCG2 


XI 4034 


PRDX1 


X67951 


POU2AF1 


Z49194 


CMAH 


D86324 


ALOX5 


J03600 


PTPN7 


M64322 


MEF2C 


S57212 
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KIAA0668 
LOC54103 _ ___ 
EFNB1 


AL021707 
AL079277 
U09303 


HELOl 


AL034374 


ADF 


S65738 


KIAA0906 


AB020713 


IGFBP4 


U20982 


LDHB 


X13794 
U03100 


CTNNA1 

EN02 


X51956 


LAT 


AJ223280 


PTPN7 


D11327 




M16942 


CSRP2 

"gla 


U57646 
U78027 


'ADA 


X02994 


[rgsTo ~ 

KIAA0870 _ __ ___ 

CD3Z 


AF045229 
AB020677 
J04132 


STATI2 


AF037989 


GSN 


X04412 


INSR 

PLA-DNA 

1CD72 


X02160 
M31525 
M54992 


EPHB6 1 D83492 _____ 
MYLK I U48959 
HLA-DQA1 AA868382 


LCK 1 M36881 


F1IL1 


AF063002 


CRIM1 


AI651806 


AQP3 


N74607 


HLA-DQB1 


M81141 


GNG11 
LARGE 
FOX01A 


U31384 

AJ007583 
AF032885 


NPR 1 


XI 5357 


GAB1 


U43885 


PTPRE 


X54134 


PDLIM1 


U90878 


NCF4 


AL008637 


ARHGEF4 


AB029035 


PTP4A2 


U14603 


CTNNA1 


AF1 02803 


SEPW1 


U67171 


CHI3L2 


U58515 


LILRA2 


U82277 
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CD79A I 

TCLIB 

TCF4 


U05259 TZTl 

AB018563 " 
""M74719 


TACTILE 


M88282 




AB002438 

AI653621 " 


TXN 


ADE2H1 


X53793 




AL049449 


GLUL 


X59834 


ZFHX1B 


AB011141 


1*411 B 


M22806 


jBFTTMl 


J04164 


.KIAA0182 


D80004 


SH2D1A 

gnaTI 


AF100539 
M69013 j 


NCF4 
'SLC2A5~ 

KL -VDPBl 


AL008637 J 
M55531 ! 
AB023210 j 
M83664 


jHLXl 


M60721 


jCTNNAl 


D14705 


FADS3 


AC004770 


i GAT A3 1 X58072 


Y13286 


TM4SF2 ~ L10373 


IGNA15 M63904 ' 


:BTG2 

Iraqi " 

|mdk 


U72649 

M29474 
X55110 




X00457 


AKR1C3 


D17793 


SLA 


D89077 


jLDHA 

PTPRC 


X02152 

| AL049279 

Y00638 


BMP2 


t M22489 


ERG 


M17254 


ICSBP1 


M91196 


CCT2 
AKAP2 


AF026166 


AB023137 
X58398 


KTAA0128 


D50918 


IGHM 1 X58529 _ 


NOTCH3 


U97669 


JUP 


M23410 


DKFZP5 860 1624 


AL039458 
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MYO10 


AB018342 


CTNNA1 


L23805 


NOS2A 


U31511 




D00749 




L29376 


ICB-1 


AF044896 


GNAI1 


AL049933 


S100A11 


D38583 


MAPKAPK3 


U09578 


ADA 


M13792 


_____ _ 


AI541308 


VDAC3 


AF038962 




AL049265 


TRIM _ _ 


A TOO A OHQ 

AJ 45 'h 


CTBP2 


AtUIojU / 


F13A1 




ZNF43 


HG620-HT620 


UisJrZp /a IrzU 1 4 


A A 1 _Q43 1 


KIAA0442 


AB007902 


C'TNNAl 


U03100 


CD2 


M16336 


BMP2 


M22489 


HSPC022 


W68830 


ICAM3 


X69819 


NCF4 ~ 


X77094 


GS3955 


D87119 


CTSC 


X87212 


GH1 


V00520 


ARPC2 


U50523 


EJ] \-l)RBl 


M32578 


GAS1 


L13698 


LAMB2 


M552I0 


EPHB4 


U07695 


C0X8 


AI525665 


KIAA0618 


N29665 


KIAA0870 


AI808958 


PIK3CG 


X83368 


IGIID 


1 K02882 


IRF4 


U52682 


HSPCB 


M16660 


CAPN3 


X85030 


CD6 


X60992 


WSX-1 


AI263885 


FXYD2 


H94881 


PTK2 


HG3075-HT3236 
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FIJCA! 


M29877 


IFADS2 


AL050118 


IKARS 


D32053 


DSCR1 


U85267 


SOX4 


X70683 


TRD 


X73617 


MHC2TA 


U18259 




AL049435 


MDK 


M94250 


ICALMl 


U12022 


PCLO 


AB011131 




A1391564 


FHII 


U46922 


MONDOA 


AB020674 


TRG 


M30894 


SPDB 


X66079 


FLJ10097 


AL035494 


TAGLN2 


D21261 


LGALS9 


Z49107 



Table 60: Additional Genes selected by 


T statistics for the TEL-AML1 Risk 


Group 




Gene symbol 


Accession Number 


ARHGEF4 


AJB029035 


|TNFRSF7 


M63928 


IPCLO 


AB0U131 


TCFL5 


AB012124 




U69883 


NMC2 


X58965" 


PTPRK 


L77886 




AL049313 


TERF2 


X93512 


UN G'1'1 


U31384 


RAG1 


M29474 




AL086190 


MADH1 


U59423 




HG3523-HT4899 


MADH1 


U59912 


P114-RHO-GEF 


AB011093 




L29254 


MDK 


M94250 


TERF2 


AF002999 


CRMP1 


D78012 
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HLA-DOB | 




NFlvBJLI 










AL080059 


(. or \_ 1 - 1 


AB010419 

_____ 


M5lK 




PIK3C3 




A [ OX"- ; 


J03600 


PTP4A3 


A 17 Pi A 1 A3 A 
Af (J4 14 D 4 


rUUzAr 1 


249194 


POU4F1 j 




PRKCB1 


^-Qy j Op 


GCAT 


Z97630 


PHYll 




[SPTAi 


M61877 


|MI 

iFYB 


3Q7025 


T I i I Pi ' ' i 

uyjU4y 


ITPR1__ 




'GTT1 




FADS3 




CCT2 


AF026166 


ISG20~ '__ 


" 1)88964 


G/T-JTD 1 


AF070614 


DR6 _ 


' AI 068S68 " "j 


TV/TW^I Pi 

MYU1U 


ABO 18342 




LI 1672 


T-STAR 

FUCA1 


_ AF051321 

M90S77 


ttt a nr\D 1 


IVlOUv/Z-O 

AB002438 




"~ X78947 


FKBP1A 


™ AT3Q1 Sfi4 


„ _ _ — 

RAB1 


_ _ 




X02160 


k 1 -A A104U 


AP101 1 1 1 ^ 


1 iV14i>rz _ 




CASP1 


ivio / ju / 


MT1L 


1 A AO'iJ^T 


MME 


J03779 




1 '~AT7ili^qq 


KARS 


D32053 


CHN2 


1 U07223 


IQGAP2 


1 U51903 


KIAA0906 


I AB020713 


STATI2 


1 AF037989 
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HLA-DMA 


X62744 


r CD36Ll 


Z22555 


PRKCB1 


"~ X06318 


GS3955 


D87119 


ACTN1 


X15804 


FLJ20154" " 


AF070644 


KIAA0769 


AB018312 


_ 


Z48199 


SOX4 


X70683 


NRTN 


U78110 


CTNND1 


AB002382 


FH1T 


U46922 


FARP1 


AI701049 


FOX01A 


AF032885 


NPY 


All 983 11 


)VDUP1 


S73591 


H2AM) 


AI885852 


(tactile 


M88282 


jSNL 


U03057 


Ijup 


M23410 


NR3C2 


M16801 


PRPS2 


Y00971 


LELRA2 


AF025531 


RNAHP 


H68340 


DPYSL2 


U97105 


ITGB2 


M15395 


=PCDII9 


AI524125 


LAIR1 


AF013249 


CD79A 


U05259 


NFKBIL1 


Y1476S 


PCCA 


S79219 


HLA-DMB 


1 U15085 


SMARCA4 


] D26156 



EXAMPLE 2 

5 To identify additional additional genes whose expression levels could be used 

as a diagnostic tool to identify ALL subgroups, leukemic blasts from 132 diagnostic 
samples were analyzed using higher density oligonucleotide arrays that allow the 
interrogation of a majority of the identified genes in the human genome. 

A subset of the 327 diagnostic pediatric ALL samples described above were 
1 0 reanalyzed using these higher density microarrays. Case selection was based on 
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providing a representation of the known prognostic ALL subtypes including 
t(9;22)[BCR-ABL], t(l;l9)[E2A-PBXl], t(12;2l)[TEL-AMLll rearrangement in the 
MLL gene on chromosome 1 lq23, and hyperdiploid karyotype with >50 
chromosomes. Since the goal was to define expression profiles that could be used to 
5 accurately diagnose the known prognostic subtypes of ALL, we chose to over 

represent these subtypes compared to what is normally seen in a random population of 
childhood leukemia patients. A total of 132 samples met these criteria and had 
sufficient material remaining to be used for this analysis. The list of samples and 
subtype distribution of the cases used in this study are shown in Tables 61 and 52, 
10 respectively. 



Table 61. Diagnostic ALL samples used for class prediction (n=132) 



BCR-ABL-#1 


Hyperdip>50-C18 


Pseudodip-#6 


BCR-ABL-#2 


Hyperdip>50-C21 


Pseudodip-C2-N 


BCR-ABL-#3 


Hyperdip>50-C22 


Pseudodip-C3 


BCR-ABL-#4 


Hyperdip>50-C23 


Pseudodip-C5 


BCR-ABL-#5 


Hyperdip>50-C27-N 


Pseudodip-C6 


BCR-ABL-#6 


Hyperdip>50-C32 


Pseudodip-C7 


BCR-ABL-#7 


Hyperdip>50-R4 


Pseudodip-C9 


BCR-ABL-#8 


Hyperdip47-50-C14-N 


Pseudodip-C14 


BCR-ABL-#9 


Hyperdip47-50-C3-N 


Pseudodip-C16-N 


B CR-ABL-Hyp erdip-#l 0 


Hypodip-#2 


Pseudodip-Rl-N 


BCR-ABL-C1 


Hypodip-2M#l 


T-ALL-#5 


BCR-ABL-R1 


Hypodip-C2 


T-ALL-#6 


BCR-ABL-R2 


Hypodip-C5 


T-ALL-#7 


BCR-ABL-R3 


MLL-#1 


T-ALL-#8 


BCR-ABL-Hyperdip-R5 


MLL-#2 


T-ALL-#10 


E2A-PBXl-#5 


MLL-#3 


T-ALL-C2 


E2A-PBXl-#6 


MLL-#4 


T-ALL-C6 


E2A-PBXl-#9 


MLL-#5 


T-ALL-C7 


E2A-PBX1-#10 


MLL-#6 


T-ALL-C11 


E2A-PBX1-#12 


MLL-#7 


T-ALL-C15 
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E2A-PBX1-#13 


MLL-#8 


T-ALL-C19 


E2A-PBX1-2M#1 


MLL-2M#1 


T-ALL-C21 


E2A-PBX1-C2 


MLL-2M#2 


T-ALL-R5 


E2A-PBX1-C3 


MLL-C1 


T-ALL-R6 


E2A-PBX1-C4 


MLL-C2 


TEL-AMLl-#6 


E2A-PBX1-C5 


MLL-C3 


TEL-AMLl-#9 


E2A-PBX1-C6 


MLL-C4 


TEL-AML1-#10 


E2A-PBX1-C7 


MLL-C5 


TEL-AML1-#14 


E2A-PBX1-C9 


MLL-C6 


TEL- AML 1 -2M# 1 


E2A-PBX1-C10 


MLL-R1 


TEL- AML1 -2M#2 


E2A-PBX1-C11 


MLL-R2 


TT7T AA/TT 1 C*A 

1 bL- AML1 -U4 


E2A-PBX1-C12 


MLL-R3 


TT7T AA/TT 1 r 1 ^ 
1 bL-AM.Ll -LO 


E2A-PBX1-R1 


MLL-R4 


TEL-AML1 -C6 


Hyperdip>50-#8 


Normal-Cl-N 


TUT AA/TT 1 r^O^ 

1 bL-AMLl -L-Zo 


Hyperdip>50-#12 


Normal-C2-N 


1 bL-AMLl -LZS 


Hyperdip>50-#14 


Normal-C3-N 


IbL-AMLl-LoU 


Hyperdip>50-Cl 


Normal-C4-N 


TTJT AA/TT 1 

1 bL-AiVlLl -Lo 1 


Hyperdip>50-C4 


Normal-C7-N 


r TT7T AHJTT 1 POO 

TbL-AMLl -L il 


Hyperdip>50-C6 


Normal-C8 


TDT AA/TT 1 C-'X'X 

ILL-AMLI-Loj 


Hyperdip>50-C8 


Normal-C9 


TEL- AML 1 - C 3 4 


Hyperdip>50-Cll 


Normal-Cll-N 


TT7T A1VTT 1 CXI 


Hyperdip>50-C13 


Normal-Rl 


TEL-AML1-C38 


Hyperdip>50-C15 


Normal-R2-N 


TEL-AML1-C40 


Hyperdip>50-C16 


Pseudodip-#5 


TEL-AML1-R3 



* Subtype Name-C# Dx Sample of patient in CCR 

Subtype Name-R# Dx Sample of patient who developed a hematologic relapse 
Subtype Name-# Dx Sample used for subgroup classification only 
Subtype Name-2M# Dx Sample of patient who later developed 2nd AML 
Subtype Name-N Dx Sample in novel group 
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Table 62. Subgroup distribution of ALL cases 


Subgroup 


Twin Cof 

irain oet 




BCR-ABL 


\\ 


4 


E2A-PBX1 


13 


5 


Hyperdiploid >50 


13 


4 


MLL 


15 


5 


T-ALL 


12 


2 


TEL-AMLl 


15 


5 


Other 


21 


7 


Total 


100 


32 



26,825 probe sets from combined Affymetrix® brand U133A and B 
5 microarrays (Affymetrix, Inc., Santa Clara, CA) showed variation in expression levels 
across the 132 diagnostic leukemia samples. In an initial analysis of these data, two 
complementary unsupervised clustering algorithms: two-dimensional hierarchical 
clustering and principle component analysis (PCA), were used to assess the major 
sub-groupings of the leukemia cases based solely on gene expression profiles. These 

10 unbiased clustering algorithms demonstrated that the pediatric ALL cases cluster 
primarily into seven major subtypes: T-ALL and 6 subtypes of B-cell lineage ALL 
corresponding to (1) rearrangement in the MLL gene on chromosome 1 lq23 5 (2) 
t(l;19)[E2A-PBXl], (3) hyperdiploid >50 chromosomes, (4) t(9;22)[BCR-ABL], (5) 
the novel subgroup, and (6) t(12;21)[TEL-AMLl]. In addition, a heterogeneous group 

15 of B-lineage cases were identified that lacked any of the defined genetic lesions and 
failed to cluster into the novel subgroup. Several of these leukemia subtypes formed 
distinct branches when all differentially expressed genes were used in the two- 
dimensional hierarchical clustering algorithm (T-ALL, Hyperdiploid >50 
chromosomes, and TEL-AMLl), whereas other subtypes clustered in multiple 

20 branches, suggestive of gene expression differences within these subclasses. Using 
PCA, the distinct nature of the B-cell lineage subtypes is better appreciated when the 
T-ALL cases were removed from the analysis. A diagnostic accuracy of 100% was 
achieved for two of the leukemia subtypes (T-ALL and TEL-AMLl), indicating the 
need to use supervised learning algorithms to achieve optimal diagnostic accuracy by 

25 gene expression profiling. 

Statistical methods were used to identify probe sets that were the best 
discriminators of the individual leukemia subtypes. In order to identify the genes that 
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provide the highest accuracy in diagnosing specific prognostic subtypes of leukemia, 
the decision tree format described elsewhere herein was used for the identification of 
leukemia subtypes. Briefly, we first defined whether a case is T- or B-cell in lineage. 
If the case is classified as T-cell, a diagnosis of T-ALL is made. If non-T, we then 
determine if the case can be classified into one of the known B-cell lineage risk 
groups, deciding sequentially if it is E2A-PBX1, TEL-AML1, BCR-ABL, rearranged 
MLL gene, and lastly hyperdiploid with >50 chromosomes. Cases not assigned to one 
of these classes are left unassigned. The use of this decision tree format directly 
influences the selection of genes, allowing the selection of discriminating genes for 
groups lower down the tree that might also be expressed by subtypes higher in the 
tree. Using a number of different supervised learning algorithms, it was found that a 
higher diagnostic accuracy is obtained using this decision tree format, as compared to 
a parallel format in which each class is identified against all others. 

Discriminating genes were selected using a chi-square metric on the 100 cases 
in the training set. Genes were selected that discriminated between a class and all 
leukemia subtypes below it in the decision tree. The number of discriminating probe 
sets per leukemia subtype at a statistical significance level of p < 0.001 (as determined 
by a permutation test) were: T-ALL, 2063; E2A-PBX1, 1059; TEL-AML1, 805; 
BCR-ABL, 201 ; MLL chimeric genes, 726; and hyperdiploid with >50 chromosomes, 
994. The lists of discriminating genes obtained using the top 100 ranked probe sets for 
the six prognostically important subgroups are contained in Tables 63-68. As multiple 
probe sets for the same gene are present on Affymetrix microarrays, the top 100 
ranked probe sets represent between 75 and 92 distinct genes, depending on the 
leukemia subtype. As shown, distinct groups of either over or under expressed genes 
distinguish cases defined by E2A-PBX1, MLL gene rearrangement, T-ALL, 
hyperdiploid >50 chromosomes, BCR-ABL, and TEL-AML1. 

The following tables contain a list of the top 100 probe sets for each diagnostic 
subtype, ranked by their chi-square value. Each table contains the Affymetrix® U133 
series probe set number, a gene description, gene symbol, chromosomal location, and 
primary GenBank reference. Chi-square values were calculated utilizing only the 
samples in the train set in a differential diagnosis decision tree format. The 
calculation of the fold change was done in a parallel format using the total data set 
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and comparing the mean signal value in the class versus the mean signal value in the 
non-class. 



Table 63. Top 100 chi-square probe sets selected for BCR-ABL 



U133 probe 



at EST FLJ39877 
at Paraoxonase/ 
arylesterase 2 
201028_s_at Antigen identified 
by monoclonal 
antibodies 12E7, 
F21 and 013 
_s_at CyclinD2 
_s_at Glycophorin C 

integral membrane 
glycoprotein 
Semaphorin 6A 
_s_at Antigen identified 
by monoclonal 
antibodies 12E7, 
F21 and 013 
204429_s_at Solute carrier 
family 2 
(facilitated 
glucose/fructose 



241812 
201876 



200953_s 
202947_£ 



223449_s 
201029_s 



9 210830_s_at Paraoxonase 

10 215028_at Semaphorin 6A 

11 220024_s_at 



12 201906_s_at HYA22 protein 

13 209365_s_at Extracellular 

matrix protein 1 

14 238689_at GPR110G 

protein-coupled 
receptor 1 10 

15 222154_s_at 

DKFZP564A2416 
unknown protein 
with a histone H5 



218084_x_at FXYD 

containing 



17 212242_at 



201445_at 
20277 l_at 



transport regulator 
5 

Tubulin, alpha 1 
(testis specific) 
Calponin 3, acidic 
KIAA0233 gene 
product 



20 212298_at Neuropilin 1 











Bcr 






Chromo- 




Chi- 


above/ 




Gene 


somal 


GenBank 


square 


below 


Fold 


symbol 


location 


Reference 


value 


mean 


change 


FLJ39877 


2 


AV648669 


47.4 


Above 


5.2 


PON2 


7q21.3 


NM_000305.1 


47.2 


Above 


18.7 


MIC2 


Xp22.32 


U82 164.1 


44.3 


Above 


2.6 


CCND2 


12pl3 


NM 001759.1 


42.3 


Above 


3.5 


GYPC 


2ql4-q21 


NM_002101.2 


42.3 


Above 


3.1 


SEMA6A 5q23.1 


AF225425.1 


42.3 


Above 


4.3 


MIC2 


Xp22.32 


NM 002414.1 


41.2 




2.4 


SLC2A5 


lp36.2 


BE560461 


41.2 


Above 


5 


PON2 


7q21.3 


AF00 1602.1 


41.2 


Above 


23.6 


SEMA6A 


5 


AB002438.1 


41.2 


Above 


4.5 


PRX 


19ql3.13 


NM 020956.1 


41.2 


Above 


8.2 




-ql3.2 










HYA22 


3p21.3 


NM 005808.1 


41.1 


Above 


43.4 


ECM1 


lq21 


U65932.1 


41.1 


Above 


6 


GPR110 


6 


BG426455 


41.1 


Above 


10.9 


DKFZP56 


2q33.1 


AK002064.1 


40.4 


Above 


12.4 


i 4A2416 










FXYD 5 


19ql2- 


NM_014164.2 


38 


Above 


1.5 


r 


ql3.1 










TUBA1 


2q36.2 


AL565074 


37 


Above 


3.2 


; CNN3 


Ip22-p21 


NM 001839.1 


36.3 


Above 


10.8 




16q24.3 


NM_014745.1 


36.3 


Above 


1.9 


KIAA023 












3 

NRP1 


10pl2 


BE620457 


36.3 


Above 


13.8 
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212458_at FLJ21897 
222488_s_at Dynactin4 
222762_x_at LIM domains 
1 



24 20095 l_s_at CyclinD2 



25 204430_s_at 



205467_at 
225660_at 
225913_at 



236489_at 
240173_at 
240499_at 



Solute c; 
family 2 
(facilitated 
glucose/fructose 
transporter), 
member 5 
Caspase 10 
Semaphorin 6A 
FLJ21140 
(Ser/Thr protein 
kinase) 
EST 
EST 
EST 



201310_s_at P3 11 protein. 

Similar to 



215617_at 
242579_at 



gastrin/cholecysto 
kinintypeB 
receptor. 
FLJ11754 
EST 

202717_s_at CDC16cell 

division cycle 16 
homolog 
205055_at Integrin, alpha E 
(antigen CD 103, 
human mucosal 
lymphocyte 
antigen 1) 
217967_s_at Chromosome 1 

ORF24 
20165 6_at Integrin, alpha 6 
207196_s_at Nef-associated 
factor 1 

40 219315_s_at hypothetical 

protein FLJ23058 

41 202123_s_at V-ablAbelson 

murine leukemia 
viral oncogene 
homolog 1 

42 219938_s_at Pro-Ser-Thr 



39 



interacting protein 
2 

EST;DKFZp434P 

0235 

Immune 



nucleotide 4 like 1 
F-box and WD-40 
domain protein 7 
(archipelago 
homolog, 
Drosophila) 



FLJ21897 


2 


AW138902 


36.3 


Above 


2.4 


DCTN4 


5q31-q32 BE2 18028 


36.3 


Above 


3.6 


LIMD1 


3p21.3 


AU144259 


36.3 


Above 


2.6 


CCND2 


12pl3 


NM 001759.1 


35.3 


Above 


12.7 


SLC2A5 


lp36.2 


NM_003039.1 


35.3 


Above 


5.1 


C ASP 1 0 


2q33-q34 NM_001230.1 


35.3 


Above 


3.6 


SEMA6A 


5q23.1 


W92748 


35.3 


Above 


3.3 


FLJ21140 


15 


AK025943.1 


35.3 


Above 


2.9 




6 


AI282097 


35.3 


Above 


16.7 




4 


AI732969 


35.3 


Above 


10.3 




10 


AA482221 


35.3 




1.3 


P311 


5q21.3 


NM_004772.1 


35.2 


Below 


2.2 


FLJ11754 


2 


AU145711 


35.2 


Above 


14.4 




4 


AA935461 


35.2 




10.2 


CDC16 


13q34 


NM_003903.1 


34.4 


Above 


1.1 


ITGAE 


17pl3 


NM_002208.3 


34.4 


Below 


2.1 


Clorf24 


lq25 


AF288391.1 


34.4 


Above 


3.2 


ITGA6 


2q31.1 


NM 000210.1 


33.9 


Above 


2.8 


NAF1 


5q32- 


NM_006058.1 


32.2 


Above 


1.4 




q33.1 








5.3 


FLJ20898 


; 16 P 13.12 NM_024600.1 


32.2 


Above 


ABL1 


9q34.1 


NM_005 157.2 


31.4 


Above 


1.8 


PSTPIP2 


18ql2 


NM_024430.1 


31.2 


Above 


5 


DKFZp4 


4 


AA741243 


31.2 


Above 


1.1 


34P0235 










3.3 


IAN4L1 


7q36 


AI435089 


30.9 


Above 


FBXW7 


4q31.23 


BE551877 


30.5 


Above 


2.4 
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46 229975_at EST 

47 200864_s_at RAB11A 

48 203089_s_at Protease, serine, 

25 

49 205376_at Inositol 

polyphosphates- 
phosphatase, type 
II 

50 209229_s_at KIAA1115 

protein 

51 219871_at Hypothetical 

protein FLJ13 197 

52 222868_s_at Interleukin 18 

binding protein 

53 235988_at GPR110G 

protein-coupled 
receptor 110 

54 239273_s_at Matrix 

metalloproteinase 
28 

55 206150_at Tumor necrosis 

factor receptor 
superfamily, 
member 7 

56 212203_x_at Interferon induced 

transmembrane 
protein 3 

57 217110_s_at Mucin 4 

58 223075_s_at hypothetical 

protein FLJ12783 

59 229139_at EST 

60 229367_s_at Hypothetical 

proteins 
FLJ22690. 

61 213093_at FLJ30869 

62 216033_s_at FYN oncogene 

related to SRC 

63 202369_s_at TRAM-like 

protein 

64 212592_at immunoglobulin J 

polypeptide, linker 
protein for 
immunoglobulin 
alpha and mu 
polypeptides 

65 21921 8_at hypothetical 

protein FLJ23058 

66 24205 l_at EST 

67 200655_s_at Calmodulin 1 

(phosphorylase 
kinase, delta) 

68 202794_at Inositol 

polyphosphate-1- 
phosphatase 

69 218348_s_at HSPC055 protein 

70 205269_at Lymphocyte 

cytosolic protein 2 



PCT/LS03/08486 





4 


AI826437 


30.5 


Above 


9.1 


RAB11A 


15q21.3- 


NM_004663.1 


29.7 


Above 


1.4 




q22.31 










PRSS25 


2pl2 


XTA/1 Ci-IIIAH 1 

1N1V1 UJoZ^f/.l 








INPP4B 


4q31.1 


NM_003 866.1 


29.7 


Above 


12.4 




19ql3.42 


BC002799.1 


29.7 


Above 


1.3 


KIAA111 












FLJ13197 


4pl4 


NM_024614.1 


29.7 


Above 


14.5 


IL18BP 


llql3 


AI521549 


29.7 


Above 


7.1 




6pl2.3 


AA746038 








MMP28 


17qll- 


AI927208 


29.7 


Above 


90.5 




q21.1 












12pl3 


NM_001242.1 


29.5 


Above 


3.2 


TNFRSF7 












rFrrM3 


8 13 1 
q 




29 5 


Ab 


2 3 


MUC4 


3q29 


AJ242547.1 


29.5 


Above 


47.5 


FLJ12783 


9q34.13- 


AL136566.1 


29.5 


Above 


3.9 




q34.3 












8 


AI202201 


29.5 


Above 


10.8 


FLJ22690 


7 


AW130536 


29.5 


Above 


3.6 


FLJ30869 Xq28 


AI471375 


29.1 


Above 


2.5 


FYN 


6 


S74774.1 


29.1 


Above 


2.7 


KIAA005 


6p21.1- 


NM_012288.1 


28.7 


Above 


3.3 


7 


pl2 










IGJ 


4q21 


AV733266 


28.7 


Above 


7.9 



FLJ23058 17q25.3 
Y 

CALM1 14q24- 
q31 

INPP1 2q32 



HSPC055 16pl3.3 
LCP2 5q33.1- 
qter 



NM_024696.1 28.7 

AI695695 28.7 

NM_006888.1 28.5 

NM_002 194.2 28.4 

NM_014153.1 27.7 

AI123251 26.9 



Below 6.2 

Above 2.2 

Above 1 .3 

Above 1 .6 

Below 1.1 

Above 1 .6 
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71 238488_at 

72 202242_at 

73 218764_at 

74 224811_at 

75 225799_at 



Ran binding 
protein 1 1 



superfamily 
member 2 
Hypothetical 
protein MGC5: 
FLJ30652 
Hypothetical 



77 203508_at Tumor necrosis 

factor receptor 
superfamily, 
member IB 

78 20807 l_s_at Leukocyte- 

associated Ig-like 
receptor 1 



80 226345_at DKFZp43401317 



81 200863_s_at RAB11A, 

RAS oncogene 
family 

82 205270_s_at Lymphocyte 

cytosolic protein 2 

83 208881_x_at Isopentenyl- 

diphosphate delta 



84 212862_at 



85 213385_at 

86 218013_x_at Dynactin4 

87 218966_at Myosin 5C 

88 200742_s_at Ceroid- 

lipofuscinosis, 
neuronal 2, late 
infantile (Jansky- 
Bielschowsky 
disease). A 



CDP- 

diacylglycerol 
synthase 
(phosphatidate 
cytidylyltransferas 
e)2 





5ql2.2 


BF511602 


26.9 


Above 


2.7 


t np^i 1 o 










TM4SF2 


X 11 4 


NM 004615.1 


26.6 


Above 


1.7 




14q22.1- 


NM_024064.1 


26.6 


Above 


1.7 


MGC5363 


q22.3 








1.5 






BF 112093 


26.6 


Above 




2ql2.3 


BF209337 


26.6 


Above 


2.2 


MGC4677 










4.7 






AI807004 


26.6 






lp36.3- 


NM_001066.1 


26 


Above 


2.6 


TNFRSF1 
B 


p36.2 










LAIR1 


19ql3.4 


NM_021708.1 


26 


Above 


2 


ADCY3 


2p24-p22 AF033861.1 


26 


Above 


2.1 




10 


AW270158 


26 


Below 


1.4 














401317 








Above 


1.4 


RAB1 1A 


15q21.3- 


AI215102 


25.8 




q22.31 










LCP2 


5q33.1- 


NM_005565.2 


25.8 


Above 


1.6 




qter 








1.7 


roil 


10pl5.3 


BC005247.1 


25.8 


Below 


CDS2 


20pl3 


AL568982 


25.8 




1.8 


I 

CHN2" 


7 


AK026415.1 


25.8 


Above 


3 


DCTN4 


5q31-q32 


NM 016221.1 


25.8 


Above 


3.6 


MY05C 


15q21 


NM 018728.1 


25.8 


Above 


1.8 


CLN2 


llpl5 


BG231932 


25 


Above 


1.5 



NM_003896.1 25 
NM_000901.1 25 



insensitive 
lysosomal 
peptidase. 

89 203217_s_at Sialyltransferase 9 SIAT9 2pll.2 

90 205259_at Nuclear receptor NR3C2 4q31.1 

subfamily 3, 
group C, member 
2 

91 220684_at T-box21 TBX21 17q21.2 NM_013351.1 25 

92 225244_at IMAGE3451454: IMAGE34 lq42.13 AA019893 25 

GRASP protein 51454 

-141- 



Above 
Above 



Above 
Above 
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93 239519_at EST 

94 203005_at Lymphotoxin beta 

receptor (TNFR 
superfamily, 
member 3) 

95 200665_s_at Secreted protein, 

acidic, cysteine- 
rich (osteonectin) 

96 204004_at PRKC, apoptosis, 

WT1, regulator 

97 204576_s_at KIAA0643 

protein 

98 214255_at ATPase, Class V, 

type 10C 

99 216985_s_at Syntaxin3A 

100 48106_at FLJ20489 



LTBR 


10 

12pl3 


AA927670 
NM_002342.1 


25 
24.3 


Above 
Above 


18.2 


SPARC 


5q31.3- 
q32 


NM 003118.1 


24.3 


Above 


9.8 


PAWR 


12q21 


AI336206 


24.3 


Above 


3 


KIAA064 


16pl2.3 


AA207013 


24.3 


Above 


2 


3 

ATP 10C 


15qll- 
ql3 

llql2.3 
12pll.l 


AB011138.1 


24.3 


Above 


9.9 


STX3A 
FLJ20489 


AJ002077.1 
H14241 


24.3 
24.3 


Above 
Above 


12 

2.8 



Table 64. Top 100 chi-square probe sets selected for E2A-PBX1 
U133 probe 



Gene Descriptii 



Chromo- Chi- 
somal GenBank square 
ion Symbol Location reference value 



E2A 
above/ 

below Fold 
mean change 



2 201695_s_at 

3 204674_at 

4 205253_at 



FAT 
suppressor 
homolog 
(Drosophila) 



tumor FAT 4q34-q35 NM_005245.1 88.0 Above 9.9 



21237 l_at 
219155_at 



10 227439 at 



phosphorylase 
lymphoid- 



NP 14ql3.1 
LRMP 12pl2.3 



NMJD00270.1 
NM_006 152.1 



Above 3.8 
Above 5.8 



membrane protein 

pre-B-cell 

leukemia 

transcription 

factor 1 

pre-B-cell 

leukemia 



PBX1 lq23 NM_002585.1 88.0 Above 3549.2 



PBX1 lq23 BF967998 88.0 Above 5283.5 



factor 1, splice 

variant 

pre-B-cell 

leukemia 

transcription 

factor 1, splice 

variant 

DKFZp586C1019 
retinal 

degeneration B 
beta 

hypothetical 
protein 
MGC10485 
E2a-Pbxl- 
associated protein 



PBX1 lq23 BF967998 88.0 Above 7472.2 



DKFZp58 1 
6C1019 

RDGBB 17q24.2 



MGC1048 llq25 
5 



AL049397.1 
NM012417.1 

AI971602 

AW005572 



Above 2.5 

Above 2.7 

Above 7.7 

Above 269.8 



WO 03/083140 



1 1 227949_at 

12 230306_at 



Q9H4T4 like 

hypothetical 

protein 

MGC10485 

retinal 



H17739 20ql3.32 
MGC1048 llq25 



AL357503 
AA5 14326 



14 203372_s_at 

15 206028_s_at 



B 
beta 

STAT induced SOCS2 
STAT inhibitor-2 
c-mer proto- MERTK 
oncogene tyrosine 



signaling 
lymphocytic 
activation 
molecule 

homolog of yeast 
long chain 
polyunsaturated 
fatty acid 
elongation 



RDGBB 17q24.2 AW193811 



12q 
2ql4.1 



AB004903.1 
NM_006343.1 



SLAM Iq22-q23 NMJXB037.1 



58.0 Above 59.3 
58.0 Above 19.2 



8.0 Above 25.6 

;0.6 Below 23.4 

10.6 Above 23.7 

!0.6 Above 6.3 



6p21.1- 
pl2.1 



AL136939.1 80.6 Above 2.2 



18 209760_at 

19 35974_at 



21 208644_at 



KIAA0922 
protein 
lymphoid- 
restricted 



KIAA092 
2 

LRMP 
HIP12 
ADPRT 



e protein 
a 

interacting protein 
12 

ADP- 

ribosyltransferase 
(NAD+; poly 
(ADP-ribose) 
polymerase) 

22 212789_at KIAA0056 KIAA005 

protein 6 

23 221113_s_at wingless-type WNT16 

MMTV 

integration site 
family, member 
16 

24 224022_x_at wingless-type WNT16 

MMTV 

integration site 
family, member 
16 

25 231040_at EST 

26 232289_at FLJ14167 FLJ14167 

27 235666_at EST FLJ20489 

28 203373_at STAT induced SOCS2 

STAT inhibitor-2 

29 210785_s_at basement ICB-1 

membrane- 
induced gene 

30 224733_at chemokine-like CKLFSF3 

factor super 
family 3 

31 225235_at hypothetical MGC1485 



4q31.23 


AL136932.1 


80.6 


Above 


2.9 


12pl2.3 


U10485 


80.6 


Above 


6.2 


12q24 


AB014555 


80.6 


Above 


3.8 


Iq41-q42 M32721.1 


80.2 


Above 


3.0 


llq25 


AI796581 


80.2 


Above 


3.9 


7q31 


NM_016087.1 


80.2 


Above 


2547.6 


7q31 


AF169963.1 


80.2 


Above 


569.1 


9 


AW5 12988 


80.2 


Above 


16.4 


17 


BF237871 


80.2 


Above 


144.1 


10 


AA903473 


80.2 


Above 


654.6 


12q 


NM_003877.1 


74.2 


Below 


24.8 


lp35.3 


AB035482.1 


74.2 


Below 


4.1 


16q23.1 


AL574900 


74.2 


Below 


41.7 


i 5q35.3 


AW007710 


74.2 


Above 


3.6 
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32 204114_at 

33 211913_s_at 

34 219551_at 

35 223693_s_at 

36 200600_at 



MGC14859 

nidogen 2 NID2 

(osteonidogen) 

c-mer proto- MERTK 

oncogene tyrosine 

kinase 

iincharacterized BM040 
bone marrow 
protein BM040 
hypothetical FLJ10324 
protein FLJ10324 



14q21- NM_007361.1 73.1 
q22 

2ql4.1 L08961.1 72.8 



Above 
Above 



MSN 



FLJ12280 
acyl-Coenzyme A 
dehydrogenase 
family, member 8 
39 23591 l_at ESTs, Weakly 
similar to PIHUB6 
salivary proline- 
rich protein 
precursor PRB1 
(large allele) 
ESTs 

DKFZp686D0521 



37 213909_at 

38 221669_s_at 



40 243533_x_at 

41 20261 5_at 



FU 12280 
ACAD8 



7p22 AL136731.1 72.8 

Xqll.2- NM_002444.1 72.5 
ql2 

3 AU147799 72.5 

llq25 BC00 1964.1 72.5 



Above 
Below 



Above 
Above 



AI885815 72.5 Above 



42 204774_at 

43 218283_at 

44 209130_at 

45 228580_at 

46 202796_at 

47 218640_s_at 

48 235099_at 



51 202208_s_at 

52 205173_x_at 



ecotropic vrral 
integration site 2A 
synovial sarcoma 
translocation gene 
on chromosome 
18-like 2 
synaptosomal- 
associated protein, 
23kDa 

serine protease 

HTRA3 

synaptopodin 

phafin 2 

ESTs, Weakly 
similar to 
PLLP HUMAN 
Plasmolipin 
[H.sapiens] 
family with 
sequence 

similarity 3, 
member C 
golgi autoantigen, 
golgin subfamily 
a, 3 

ADP-ribosylation 
factor-like 7 
CD58 antigen, 
(lymphocyte 
function- 



DKFZp68 

6D0521 

EVI2A 



H09663 
BF222895 



72.5 
68.6 



17qll.2 NM_014210.1 68.6 
3p21 NM_016305.1 68.6 



Above 
Below 



Below 
Above 



SNAP23 15ql4 BC003686.1 67.8 Below 



HTRA3 

KIAA102 
9 

FLJ13187 



4pl6.1 AI828007 66.6 

5q33.1 NM_007286.1 66.5 

8q21.3 NM_024613.1 66.5 

3 AW080832 66.5 



Above 
Above 



Above 
Above 



ARL7 
CD58 



2q37.2 BC001051.1 65.3 
lpl3 NMJMH779.1 65.3 



15.1 
37.7 



3q21.1 NM 018456.1 72.8 Above 3.0 



7q22.1- NM_014888.1 65.3 
q31.1 



GOLGA3 12q24.33 NMJ)05895.1 65.3 Above 



Above 
Above 



WO 03/083140 
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3) 








?° 58 h anti S en > 








^ ymp yi 
















associated antigen 








3) 






t 

- 




HPCAL1 


55 


21335o_at 


ttta Ansno 

JSJAAU8UZ 


TiAJ\J\\Jo\J 






protein 


2 


56 


222699 s at 


phafm 2 


FLJ13187 


57 


225618_at 


EST 




58 


Zoo / fo at 


DKFZp451L157 


DKFZp45 








1L157 


g 


239427 at 


ESTs 






4 / uoy_ai 


Rho GTPase ARHGAP 






activating protein 8 


61 


205769 at 


solute carrier 


SLC27A2 






family 27 (fatty 








acid transporter), 








member 2 






210786 s at 


Friend leukemia 


flu 






virus integration 1 




63 


zizyoj_at 


DKFZp434E033 


DKFZp43 








4E033 


64 


227441 s at 


E2a-Pbxl- 


EB-1 






associated protein 




65 


zj4zoi at 


DKFZp761M1012 DKFZp76 






1 


1M10121 


66 


244565_at 


ESTs 




67 


^fPISI at 

zuzio i_at 


KIAA0247 gene KIAA024 






product 


7 


68 


202207_at 


ADP-ribosylation 


ARL7 






factor-like 7 




69 


zu / j / 1 x_ai 


basement 
membrane- 


ICB-1 






induced gene 






zuyjjo_s_at 


huntingtin 


HIP12 






interacting protein 
12 




71 


213005 s_at 


KIAA0172 


KIAA017 






protein 


2 


72 


IIAUSA it 

z^0ojH__ai 


cDNA 


DKFZp66 






DKFZp667F0617 


7F0617 


73 


226233_at 


tubulin-specific 


TBCE 






chaperone e 






203435 s at 


membrane 


MME 






metallo- 








endopeptidase 








(neutral 








endopeptidase, 
enkephalinas e, 








CALLA, CD 10) 




75 


202478 at 


GS3955 protein 


GS3955 


76 


202479 s at 


GS3955 protein 


GS3955 


77 


203999_at 


synaptotagmin I 


SYT1 


78 


212149_at 


KIAA0143 


KIAA014 






protein 


3 



lpl3 


BC005930.1 


65.3 


Above 


2.5 




BE617588 


65 3 


Below 


2.6 


18pll.21 


AB018345.1 


65.3 


Above 


12.7 


8q21.3 


BF439250 


65.3 


Above 


3.5 


17 


AI769587 


65.3 


Below 


5.3 


10 


AI244661 


65.3 


Above 


23.5 


1 


AA131524 


65.3 


Above 


13.7 


22ql3.31 


AA533284 


65.3 


Above 


3.3 


15q21.2 


NM_003645.1 


65.1 


Above 


56.0 


llq24.1- 


M93255.1 


65.1 


Above 


2.2 












4 


BF1 15739 


65.1 


Above 


7.1 


12 


AW005572 


65.1 


Above 


1139.4 


12 


AL137313.1 


65.1 


Above 


960.8 


10 


AI685824 


65.1 


Above 


7.6 


14q24.1 


NM_014734.1 


63.7 


Above 


1.8 


2q37.2 


NM_005737.2 


63.7 


Above 


3.2 


1 P 35.3 


NM_004848.1 


63.7 


Below 


4.4 


12q24 


AB013384.1 


61.1 


Above 


23.8 


9p24.3 


D79994.1 


61.1 


Above 


8.3 


20 


AA743694 


61.1 


Above 


12.6 


lq42.3 


BG1 12197 


60.0 


Above 


2.6 




NM 007287.1 


59.9 


Below 


2.2 


1 ■ 










2p25.1 


NM 021643.1 


59.3 


Above 


4.0 


2p25.1 


BC002637.1 


59.3 


Above 


3.3 


12cen- 


NM_005639.1 


59.3 


Above 


3.9 


q21 










8q24.12 


AA805651 


59.3 


Below 


13.5 



-145- 
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81 224856_at 

82 200811_at 

83 201722_s_at 



84 22371 l_s_at 

85 233273_at 



19pl3.3 
6q21 



6p21.3- 

21.2 

19pl3.3 



(GalNAc-Tl) 

HSPC144 protein HSPC144 llq25 
cDNA FLJ12010 FLJ12010 1 

fis 

mitogen-activated MAPKAP lq32 
protein kinase- K2 
activated protein 
kinase 2 

immunoglobulin IGSF3 lpl3 
superfamily, 



minor HA-1 

histocompatibility 

antigen HA-1 

p53 regulated PA26 

PA26 nuclear 

protein 

FK506 binding FKBP5 
protein 5 

cold inducible CIRBP 
RNA binding 
protein 

UDP-N-acetyl- GALNT1 
alpha-D- 

galactosamine:pol 
ypeptide N- 
acetylgalactosami 

1 



88 217983_s_at 

89 218087_s_at 

90 218491_s_at 

91 201825_s_at 

92 202206_at 

93 218683_at 

94 226590_at 

95 227440_at 

96 229770_at 

97 40148_at 



98 212959_s_at 

99 203143_s_at 



100 209683_at 



ribonuclease 6 RNASE6P 6q27 
H3 

domain containing 



10q23.3- 
q24.1 



HSPC144 protein HSPC144 
CGI-49 protein LOC5 1 09 



llq25 
lq44 



ADP-ribosylation ARL7 2q37.2 
factor-like 7 

polypyrimidine PTBP2 lp22.11- 
tract binding p21.3 
protein 2 

cDNA clone 9 
EUROIMAGE 
1517766 

E2a-Pbxl- EB-1 12 

associated protein 

hypothetical FLJ31978 12q24.33 

protein FLJ3 1978 

amyloid beta (A4) APBB2 4pl4 
precursor protein- 
binding, family B, 
member 2 (Fe65- 
like) 

MGC4170 protein MGC4170 12q23.1 
KIAA0040 gene KIAA004 lq24-25 
product 0 
hypothetical DKFZP56 2p24.2 

protein 6A1524 
DKFZp566A1524 

-146- 



BE349017 


59.3 


Below 


2.9 


"MA/f fllAA^A 1 

JNIVI Ul^OH.l 




Below 


4.7 


AL 122066.1 


59.3 


Below 


5.5 


NM_001280.1 






5.8 


NM_020474.2 


59.1 


Below 


1.8 


AF182413.1 


59.1 


Above 


2.0 


AU146834 


59.1 


Above 


30.6 


AI141802 


57.9 


Above 


2.1 


AB007935.1 






4.4 


NM_003730.2 


57.9 


Below 


3.4 


NM_015385.1 


57.9 


Above 


25.1 


NM 014174.1 






1.4 


AL572542 


57.8 


Above 


2.2 


JNM_OOD 15 1. 1 






3.9 


NM_021 190.1 


57.8 


Above 


1.8 


AA031404 


57.8 


Above 


3.1 


AW005572 








1 AI041543 


57.8 


Above 


51.8 


U62325 


57.8 


Above 


6.2 


AK001821.1 


57.2 


Below 


3.0 


T79953 


56.3 


Above 


2.4 


AA243659 


56.3 


Below 


10.0 
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Table 65. Top 100 chi-squar e probe sets selected for Hyperdiploid >50 

1 1 " TT 



U133 probe 



Gene description Symbol 



Chi- above/ 
„ J .„„ ! square below Fold 

Location GenBankRef value mean change 



Xqll.2- NM_002444.1 34.0 Above 
ql2 



Moesin 
(membrane- 
organizing 
extensio spike 
protein) 

2 200737_at Phosphoglycerate PGK1 

kinase 1 

3 200980_s_at Pyruvate PDHA1 

dehydrogenase 
(lipoamide) alpha 
1 

4 201136_at Proteolipid protein PLP2 

2 (colonic 
epithelium- 
enriched) 

5 201807_at Vacuolar protein VPS26 

sorting 26 (yeast) 

6 202214_s_at Cullin4B . CUL4B 

7 202557_at Stress 70 protein STCH 

chaperone, 
microsome 
associated, 60 kD 

8 202593_s_at membrane MIR16 

interacting protein 
ofRGS16 

9 203680_at Protein kinase, PRKAR2 

cAMP-dependent, B 
regulatory, type II, 
beta 

10 204194_at BTB and CNC BACH1 

homology 1, basic 
leucine zipper 
transcription 
factor 1 

11 205324_s_at FtsJhomologl FTSJ1 

(E. coli) 

12 208598_s_at Upstream UREB1 

regulatory element 
binding protein 1 

13 208861_s_at Alpha ATRX 

thalassemia/menta 

I retardation 
syndrome X- 
linked (RAD54 
homolog, S. 
cerevisiae) 

14 211342_x_at trinucleotide TNRC11 Xql3 BC004354.1 34.0 Above 1.8 

repeat containing 

II (THR- 
associated protein, 
230 kDa subunit) 



Xql3 NM_000291.1 34.0 Above 1.8 

Xp22.2- NM_000284.1 34.0 Above 1.7 
p22.1 



Xpll.23 NM_002668.1 34.0 Above 



10q21.1 NM_004896.1 34.0 Above 1.7 

Xq23 NM_003588.1 34.0 Above 1.9 
21qll AI718418 34.0 Above 2.0 



16pl2- NM_016641.1 34.0 Below 1.6 
pll.2 

7q22- NM_002736.1 34.0 Above 3.3 
q31.1 

21q22.11 NM_001 186.1 34.0 Above 1.8 



Xpll.23 NM_012280.1 34.0 Above 2.1 
Xpll.22 NM_005703.2 34.0 Above 1.6 



Xql3.1- U72937.2 34.0 Above 
q21.1 



WO 03/083140 



15 21607 l_x_at Trinucleotide TNRC11 
repeat containing 



16 218573_at APR-1 

prptein/melanoma 
-associated 



Xql3 AF132033 
MAGEH1 Xpll.22 NM_014061.1 34.0 



PCT/LS03/08486 
34.0 Above 1.8 



17 219485_s_at proteasome 

(prosome, 
macropain) 26S 
subnnit, non- 
ATPase, 10 

18 200655_s_at Calmodulin 1 CALM1 

(phosphorylase 
kinase, delta) 

19 200738_s_at Phosphoglycerate PGK1 



PSMD10 Xq22.3 NM_002814.1 34.0 Above 2.4 



20 200944_s_at High-mobility 

group (nonliistone 
chromosomal) 
protein 14; 
member of the 
HMG 14/17 
family 

21 201092_at Retinoblastoma 

binding protein 
7/RbAp46 

22 201100_s_at Ubiquitin specific USP9X 



TPD52 
UBE2A 



enzyme E2A 
(RAD 6 homolog) 
25 202325_s_at ATP synthase, H+ ATP5J 



14q24- NM_006888.1 30.1 
q31 



Xql3 NM_000291.1 30.1 
21q22.2 NM_004965.1 30.1 



23 201688_s_at Tumor protein 

D52 

24 201899_s_at Ubiquitin- 



Xpll.4 NM_004652.2 30.1 

8q21 BE974098 30.1 

Xq24- NM_003336.1 30.1 
q25 

21q21.1 NM_001685.1 30.1 



Xq28 



mitochondrial F0 
complex, subunit 
F6 

26 202829_s_at Synaptobrevin- SYBL1 

like 1 

27 202854_at Hypoxanthine HPRT1 Xq26.1 NM_000194.1 

phosphoribosyltra 
nsferase 1 (Lesch- 
Nyhan syndrome) 

28 206846_s_at Histone HDAC6 Xpll.23 NM_006044.2 

deacetylase 6 

29 209370_s_at SH3-domain SH3BP2 4pl6.3 AB000462.1 

binding protein 2 

30 209565_at zinc finger protein ZNF183 Xq25- BC000832.1 

183 q26 

31 212846_at KIAA0179 KIAA017 21q22.3 D80001.1 

protein. 9 

32 217356_s_at Phosphoglycerate PGK1 Xql3 S81916.1 



Above 
Above 



RBBP7 Xp22.31 NM_002893.2 30.1 Above 



34 218386_x_at Ubiquitin specific USP16 
protease 16; de- 



30.1 


Above 


1.8 


30.1 


Above 


1.6 


30.1 


Above 


1.5 


30.1 


Above 


1.4 


30.1 


Above 


1.5 


30.1 


Above 


3.1 


30.1 


Above 


2.2 


30.1 


Above 


2.0 


30.1 


Above 


1.8 


30.1 
30.1 


Above 
Above 


1.8 
1.7 



WO 03/083140 



PCT/LS03/08486 



ubiquitinates 
histone H2A; 
ubiquitous 



35 218402_s_at 

36 218495_at 

37 218499_at 



38 218757_s_at 

39 219038_at 

40 229967_at 



41 242794_at 

42 201132_at 



43 201312_s_at 



Hermansky- HPS4 

Pudlak syndrome 

4 

Ubiquitously- UXT 

expressed 

transcript 

Mst3 and SOK1- MST4 
related 

kinase/STE20-like 
kinase; contains a 
Ser/Thr protein 
kinase domain 
Similar to yeast 
UpG, variant B 
Hypothetical 
protein FLJ11565 
Chemokine-like 
factor super 
family 2. 
EST 

Heterogeneous 
nuclear 

ribonucleoprotein 
H2 (F) 
SH3 domain 
binding glutamic 
acid-rich protein 



pll.22 
Xq26.1 



UPF3B Xq25- 
q26 

FLJ11565 Xq22.2 
CKLFSF2 16q23.1 



4q31.1 
HNRPH2 Xq22 



SH3BGR Xql3.3 



like 



44 201894_s_at 



45 201923_at 

46 20237 l_at 



47 203126_at 



L 

DCN 



Decorin; 
glycoprotein that 
binds to type I 
collagen fibrils & 
plays a role in 
matrix assembly. 
Peroxiredoxin 4 
Hypothetical 
protein FLJ21 174 
Inositol(myo)-l(or IMPA2 

4 )" 

monophosphatase 



PRDX4 Xp22.13 
FLJ21174 Xq22.1 



proteasome 



PSMC1 



macropain) 26S 
subunit, ATPase, 
1 

49 204835_at polymerase (DNA POLA 

directed), alpha 

50 212071_s_at Spectrin, beta, SPTBN1 

non-ei-ythrocytic 1 

51 212419_at EST 

52 212718_at Hypothetical MGC537 

protein MGC5370 

53 213502_x_at Homo sapiens FU3231^ 

cDNA FLJ323 13 



Xp22.1- 

p21.3 

2p21 



NM 022081.1 


30.1 


Below 


3.4 


NM_004182.1 


30.1 , 


Above 


1.5 


NM 016542.1 


30.1 


Above 


2.5 


NM_023010.1 


30.1 


Above 


2.3 


NM 024657.1 


30.1 


Above 


6.9 


AA778552 


30.1 


Above 


4.3 


AI569476 


30.1 


Above 


3.2 


NM_019597.1 


30.0 


Above 


2.0 


"KT\/f fWY*fY>9 1 

INJYL UlOUZZ.l 






1.6 


NM_001920.1 


30.0 


Above 


1.5 


NM 006406.1 


30.0 


Above 


1.9 


NM_024863.1 


30.0 


Above 


3.6 


NM_014214.1 


30.0 


Above 


4.1 


NMJ)02802.1 


30.0 


Above 


1.3 


NMJH6937.1 


30.0 


Above 


2.0 


BE968833 


30.0 


Below 


1.7 


AL049949.1 


30.0 


Above 


13.1 


BG1 10231 


30.0 


Above 


1.5 


3 X03529 


30.0 


Below 


1.8 



-149- 



WO 03/083140 



PCT/LS03/08486 



54 214051_at 

55 226039_at 



57 200642_at 

58 200799_at 

59 200943_at 



60 201018_at 



fis, clone 
PROST2003232, 
weakly similar to 
BETA- 

GLUCURONIDA 
SE PRECURSOR 
(EC 3.2.1.31) 

Thymosin, beta TMSNB 



Mannosyl (alpha- 

l,3)-glycoprotein 

beta-l,4-N- 

acetylglucosaminy 

ltransferase 

hypothetical 

protein 

MGC15737 

Superoxide 

dismutase 1, 

soluble 

Heat shock 70kD 
protein 1A 
High-mobility 
group (nonhistone 
chromosomal) 
protein 14; 
member of the 
HMG 14/17 
family 
Eukaryotic 
translation 
initiation factor 
1A 



Xq21.33- 
q22.3 
MGAT4A 2qll.2 



Xq22.1 
2lq22.11 



HSPA1A 
HMG 14 



6p21.3 
21q22.2 



BF677486 30.0 

AW006441 30.0 

AA847654 30.0 

NM_000454.1 26.7 

NM_005345.3 26.7 

NM_004965.1 26.7 



Above 3.1 

Above 3.0 

Above 5.6 

Above 2.3 

Above 2.7 

Above 1.6 



Xp22.12 BE542684 26.7 Above 1.8 



61 201311_s_at SH3 domain SH3BGR Xql3.3 AL515318 26.7 Above 1.6 



62 201443_s_at 

63 201472_at 

64 201689_s_at 

65 202602_s_at 

66 20304 l_s_at 

67 203102_s_at 

68 203744_at 



acid-rich protein 
like 

ATPase, H+ 
transporting, 
lysosomal 
interacting protein 
2 

Von Hippel- 
Lindau binding 
protein 1 
Tumor protein 
D52 

HIV TAT specific 
factor 1 
Lysosomal- 
associated 
membrane protein 
2 

Mannosyl (alpha- 

l,6-)-glycoprotein 

beta-l,2-N- 

acetylglucosaminy 

ltransferase 

High-mobility 



ATP6IP2 Xq21 AF248966.1 26.7 Above 1.9 



VBP1 

TPD52 

HTATSF1 

LAMP2 



Xq28 



Xq26.1- 

q27.2 

Xq24 



NM 003372.2 26.7 Above 1.7 



BE974098 26.7 
NM_014500.1 26.7 
J04183.1 26.7 



Below 4.3 
Above 1.5 
Above 3.1 



MGAT2 14q21 NM 002408.2 26.7 Above 1.6 



Xq28 NMJ)05342.1 26.7 Above 1.9 
-150- 



WO 03/083140 



PCT/LS03/08486 



69 205518_s_at 



70 208683_at 



group (nonhistone 

chromosomal) 

protein 4 

Cytidine 

monophosphate- 

N- 

acetylneuraminic 
acid hydroxylase 
(CMP-N- 
acetykeurarriinate 
monooxygenase) 
Calpain 2, (m/II) 
large subunit; 
calcium- 
dependent Cys 
protease. 
Phosphoribosyl 
pyrophosphate 
synthetase 1; 
purine 



72 210786_s_at 

73 212070_at 

74 213334_x_at 

75 215117_at 

76 218694_at 

77 22274 l_s_at 

78 223082_at 

79 225105_at 

80 225406_at 

81 225553_at 

82 226199_at 

83 226875_at 

84 232974_at 

85 46323_at 



Friend leukemia 
virus integration 1 
G protein-coupled 
receptor 56 
Three prime repair 
exonuclease 2 
Recombination 



V(D)J 

recombinase. 
ALEX1 protein 

hypothetical 
protein FLJ1 1101 
SH3 -domain 
kinase binding 
protein 1 

clone MGC23936 
IMAGE:3838595, 
mRNA, complete 
cds 

Twisted 
gastrulation 
Homo sapiens 
cDNA FLJ12874 
fis 

Hypothetical 
protein 
MGC23937 
Hypothetical 
protein FLJ32 122 
CDNAFLJ12417 
fis 

SCAN-1 Ca-H-- 
dependent ER 



CMAH 


6p22-p23 NM_003570.1 


26.7 


Below 


2.9 


CAPN2 


Iq41-q42 M23254.1 


26.7 


Above 


2.2 


PRPS1 


Xq21- 


BC001605.1 


26.7 


Above 


1.4 




q27 










FLU 


llq24.1- 


M93255.1 


26.7 


Below 


2.5 




q24.3 








2.4 


GPR56 


16ql3 


AL554008 


26.7 


Above 


• TREX2 


Xq28 


BE676218 


26.7 


Above 


1.7 


RAG2 


llpl3 


AW058148 


26.7 


Below 


27.2 


ALEX1 


Xq21.33- 


NM_016608.1 


26.7 


Above 


2.8 




q22.2 








1.5 


FLJ1U01 


6p21.1 


AI761426 


26.7 


Above 


SH3KBP1 


Xp22.1- 


AF230904.1 


26.7 


Above 


2.0 




P 21.3 












12q23.3 


BF969397 


26.7 


Above 


2.1 


TSG 


18pll.3 


AA195009 


26.7 


Above 


1.9 




14q22.2 


AL042817 


26.7 


Above 


1.6 


MGC2393 
7 


Xql3.1 


AL563795 


26.7 


Above 


2.1 


FLJ32122 


Xq24 


AI742838 


26.7 


Above 


2.3 




Xp22.31 


AU148256 


26.7 


Above 


3.1 


SHAPY 


17q25.3 


AL120741 


26.7 


Above 


1.7 



WO 03/083140 
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86 


203694_s_at 


DEAD/H (Asp- 


DDX16 


6p21.3 






Glu-Ala-Asp/His) 










box polypeptide 










16 






87 


200658_s_at 


Proliibitin 




17q21 


88 


201898_s_at 


ubiquitin- 




Xq24- 






conjugating 




qZD 






enzyme E2A 










(RAD 6 homolog) 






89 


203556_at 


KIAA0854 


KIAA085 






protein 


4 


P 


90 


203745_at 


Holocytochrome c 








synthase 










(cytochrome c 










heme-lyase) 






91 


203909_at 


Solute carrier 


SLC9A6 


Xq26.3 






family 9 










(sodium/hydrogen 










exchanger), 
isoform 6 






92 


204446_s_at 


Arachidonate 5- 




10qll.2 






lipoxygenase 






93 


205191_at 


Retinitis 


RP2 






pigmentosa 2 (X- 




pll.21 






linked recessive) 






94 


206874_s_at 


Ste20-related 
serine/tlueonine 


SLK 


10q25.1 


95 


208073_x_at 


kinase 

Tetratricopeptide 


TTC3 


21q22 2 






repeat domain 3 




6 P 21 


96 


209056_s_at 


CDC5 cell 


CDC5L 






division cycle 5- 










like (S. pombe) 






97 


210645_s_at 


Tetratricopeptide 


TTC3 


21q22.2 






repeat domain 3 






98 


215773_x_at 


ADP- 


ADPRTL2 14qll.2- 






ribosyltransferase 




q!2 






(NAD+; 










poly(ADP-ribose) 










polymerase)-like 2 


Xpll.23- 


99 


215884_s_at 


Ubiquilin 2 


UBQLN2 










pll.l 


10C 


1 217954_s_at 


PHD finger 


PHF3 


6 






protein 3 







26.3 
26.3 

26.3 
26.3 



AK001029.1 26.3 
NM_0 15 153.1 26.3 



Above 1.3 



Above 2.0 
Above 1.6 



Below 1.6 

Above 2.1 

Above 1.9 

Above 4.2 

Above 2.1 

Above 1.6 

Above 1.9 

Above 1.4 

Above 2.2 

Above 1.6 

Above 1.9 

Above 1.5 



Table 66. Top 10 0 chi-square probe sets selected for MLL 

— MLL 

Chromo- Chi- above/ 

somal square below Fold 

Description Symbol Location GenBankRef value mean change 



U133 probe 



- 202603_at a disintegrin and ADAM10 15q22 N51370 44.6 Above 1.8 
metalloproteinase 
domain 10 

2 219463_at chromosome 20 C20orfl03 20pl2 NM_012261.1 44.6 Above 24.7 

open reading 
frame 103 

3 224772_at neuron navigator 1 NAV1 AB032977.1 44.6 Below 3.8 

4 204069_at Meisl, myeloid MEIS1 2pl4-pl3 NM_002398.1 44.4 Above 73.7 

-152- 



WO 03/083140 







ecotropic viral 










integration site 1 










liomolog 






5 


218966 at 


myosin 5C 


MY05C 


15q21 


6 


226939_at 


cDNA FLJ37247 


FLJ37247 




7 


204446_s_at 


fis 

arachidonate 5- 


ALOX5 


10qll.2 






lipoxygenase 




3pl4.2 


8 


206492_at 


fragile histidine 


FHIT 






triad gene 






9 


212588_at 


protein tyrosine 


PTPRC 


Iq31-q32 






phosphatase. 










receptor type, C 




9pll.2 


10 


215925_s_at 


CD72 antigen 


CD72 






(ligand for CD5) 








211733 x at 


sterol carrier 


SCP2 


lp32 






protein 2 






12 


212386_at 


cDNA FLJ11918 


FLJ11918 




13 


218764_at 


fis 

Protein Kinase C 


PRKCH 


14q22.1- 






eta isoform. 




q22.3 


14 


218847_at 


IGF-II mRNA- 


IMP-2 


3q28 


15 


222409_at 


binding protein 2 
coronin, actin 


COROIC 


12q24.1 






binding protein, 










1C 






16 


242172_at 


ESTs 






17 


201153_s_at 


muscleblind-like 


MBNL 


3q25 






(Drosophila) 




10q23- 


18 


210487_at 


deoxynucleotidyltr DNTT 






ansferase, terminal 


q24 


19 


219686_at 


gene for 


HSA2508 


4pl6.2 






serine/threonine 


39 








protein kinase 






20 


22698 l_at 


Homo sapiens, 










clone 










IMAGE:4401491, 










mRNA 






21 


203375_s_at 


tripeptidyl 


TPP2 


13q32- 






peptidase II 




q33 


22 


221676 s_at 


coronin, actin 


COROIC 


12q24.1 






binding protein, 










1C 






23 


201152_s_at 


muscleblind-like 


MBNL 


3q25 






(Drosophila) 






24 


221773_at 


ELK3, ETS- 


ELK3 


12q23 






domain protein 










(SRF accessory 










protein 2) 






25 


201162_at 


insulin-like 


IGFBP7 


4ql2 






growth factor 







binding protein 7 

26 201163_s_at insulin-like IGFBP7 4ql2 

growth factor 
binding protein 7 

27 203836_s_at mitogen-activated MAP3K5 6q22.33 

protein kinase 
kinase kinase 5 

28 203837_at mitogen-activated MAP3K5 6q22.33 

-153- 



NM 018728.1 
AI202327 


44.4 
44.4 


Below 
Above 


4.5 
6.9 


NM_000698.1 


40.7 


Below 


66.8 


NM_002012.1 


40.7 


Below 


36.6 


AI809341 


40.7 


Above 


2.3 


AF283777.2 


40.7 


Above 


3.0 


BC005911.1 


40.1 




1.5 


AK021980.1 


40.1 


Below 


3.1 


NM_024064.1 


40.1 


Below 


7.6 


NM_006548.1 


40.1 


Above 


23.2 


a t 1 /tomn 1 






4.8 


N50406 
NM_021038.1 


40.1 
40.0 


Above 
Above 


33.6 
2.1 


Ml 1722.1 


40.0 


Below 


2.9 


NM_018401.1 


40.0 


Below 


28.3 


AW002079 


37.4 


Below 


1.0 


NM_003291.1 


37.2 


Above 


1.6 


BC002342.1 


37.2 


Above 


3.5 


NM_021038.1 


36.2 


Above 


2.2 


AW575374 


36.2 


Below 


8.2 


NM_001553.1 


36.0 


Above 


4.3 


NM_001553.1 


36.0 


Above 


4.0 


D84476.1 


36.0 


Above 


13.9 


NM_005923.2 


36.0 


Above 


4.2 



WO 03/083140 



PCT/LS03/08486 



kinase kinase 5 

29 213891__s_at cDNAFLJ11918 FLJ11918 

fis 

30 214895_s_at a disintegrin and ADAM10 15q22 AU135154 

metalloproteinase 
domain 10 

31 226415_at KIAA1576 KIAA157 16q22.1 AA156723 

protein 6 

32 235879_at ESTs 

33 212387_at cDNAFLJ11918 FLJ11918 



AI927067 



AI697540 
AK021980.1 



36.0 
36.0 



36.0 
35.8 



Below 
Above 



Above 
Below 



34 218988_at 

35 228555_at 



bladder cancer 



BLOV1 12ql5 
CAMK2D 



NM_018656.1 35.8 
AA029441 35.8 



protein 

EST; by BLAT 
calcium/calmoduli 
n-dependent 
Protine Kinase 
type II Delta chain 
(CAMK GROUP 

36 202975_s_at Rlio-related BTB RHOBTB 5q21.2 N21138 35.3 Above 

domain containing 3 
3 

37 201105_at lectin, galactoside- LGALS1 22ql3.1 NM 002305.2 34.5 Above 

binding, soluble, 1 
(galectin 1) 

38 203434_s_at membrane MME 3q25.1- AI433463 34.1 Below 

metallo- , q25.2 

endopeptidase 

(neutral 



CALLA, CD10) 

39 212135_s_at calcium ATP2B4 AW517686 34.1 Below 2.4 



40 212136_at calcium ATP2B4 AW517686 34.1 Below 2.1 



ATPase plasma 
membrane 

DKFZp54 N52572 34.1 Below 6.4 

DKFZp547P158 7P158 

42 218217_at likely homolog of RISC 17q23.2 NM_021626.1 32.8 Above 3.4 

rat and mouse 

retinoid-inducible 

serine 

carboxypeptidase 

43 225841_at hypothetical FLJ30525 1 P 13.2 BE502436 32.8 Above 1.8 

protein FLJ30525 

44 226668_at Homo sapiens, W80623 32.8 Above 2.4 

similar to WD 
domain, G-beta 
repeat containing 



-154- 



WO 03/083140 



PCT/LS03/08486 



45 200989_at 



factor 1, alpha 
subunit (basic 
helix-loop-helix 



46 201151_s_at 

47 201563_at 

48 203753_at 

49 205668_at 

50 20647 l_s_at 

51 211302_s_at 

52 212012 at 

53 212063_at 

54 213241_at 

55 214651_s_at 

56 218140_x_at 

57 219988_s_at 

58 223046_at 

59 224150_s_at 

60 224933_s_at 

61 201078_at 

62 205550_s_at 



63 212382_at 

64 225019_at 



65 225202_at 

66 228855_at 



67 231899_at 

68 52164_at 



factor) 

muscleblind-like 
(Drosophila) 
sorbitol 
dehydrogenase 
transcription 
factor 4 
lymphocyte 
antigen 75 
plexin CI 
phosphodiesterase 
4B ; cAMP- 
specific 
Melanoma 
associated gene 
CD44 antigen 
PLEXIN cl 
homeo box A9 
APMCF1 protein 
hypothetical 
protein FLJ10597 
egl nine homolog 
1 (C. elegans) 
plO-binding 
protein 
hypothetical 
protein 

DKFZp761F0118 
transmembrane 9 
superfamily 
member 2 
brain and 
reproductive 



(TNFRSF1A 
modulator) 
cDNA FLJ1191S 
fis 

calcium/calmodul 
n-dependent 
protein kinase 
(CaM kinase) II 
delta 

Rho-related BTB 
domain containing 3 

3 

nudix (nucleoside 
diphosphate 
linked moiety X)- 
type motif 7 
KIAA1726 
protein 

chromosome 1 1 
open reading 

-155 



HIF1A 


14q21- 


NM_001530.1 


32.2 


Below 


1.8 




q24 










MBNL 


3q25 


NM_021038.1 


32.2 


Above 


2.6 


SORD 


15ql5.3 






Above 


1.8 






NM 003199.1 


32.2 


ToTe 


2.9 


LY75 


2q24 


NM 002349.1 


32.2 




2.1 


t>T V~\TP1 




NM 005761.1 


32.2 


Above 


7.7 


PDE4B 


lp3 


L20966.1 


32.2 


Below 


3.0 


D2S448 


2pter- 


AF200348.1 


32.2 


Below 


2.4 




P 25.1 










CD44 


llpl3 


BE903880 


32.2 


Above 


3.1 


PLXNC1 




AF035307.1 


32.2 


Above 


2.5 


HOXA9 


7pl5-pl4 


U41813.1 


32.2 


Above 


28.5 


APMCF1 


3q22.2 


NM 021203.1 


32.2 


Above 


1.4 


FLJ10597 


lp34.1 


NM_018150.1 


32.2 


Above 


1.9 


EGLN1 


lq42.1 


NM_022051.1 


32.2 


Below 


4.2 




3q22-q23 AF289495.1 


32.2 




2.1 


DKFZp76 


10q22.1 


AB037801.1 


32.2 


Above 


1.9 


1F0118 












TM9SF2 


13q32.3 


NM_004800.1 


32.0 


Above 


1.5 


BRE 


2p23.3 


NM_004899.1 


32.0 


Above 


2.0 


FIJI 1918 




AK02 1980.1 


32.0 


Below 


2.7 




4q25 


AA777512 


32.0 




3.6 


RHOBTB 
i 3 


5q21.2 


BE620739 


32.0 


Above 


5.5 


NUDT7 




AI927964 


32.0 


Above 


5.6 


KIAA172 


llq23.1 


AB051513.1 


32.0 


Above 


33.0 


6 

Cllorf24 


llql3 


AA065185 


32.0 


Above 


2.3 



WO 03/083140 



PCT/LS03/08486 



frame 24 

69 212660_at KIAA0239 

protein 

70 213513_x_at actin related 

protein 2/3 
complex, subunit 
2, 34kDa 
hypothetical 
protein FLJ23309 
ESTs 

brain abundant, 
membrane 
attached signal 
protein 1 

74 202604_x_at a disintegrin and 



71 222603_at 



72 238558_at 

73 20239 l_at 



5q31.1 
2q36.1 



5pl5.1- 
pl4 



AI735639 31.7 Below 
BG034239 31.7 Above 



AL136980 31.7 Above 



AI445833 31.7 Above 
NM 006317.1 31.3 Above 



metalloproteinase 
a 10 



ADAM 10 15q22 NM001 110.1 31.3 Above 



75 203435_s_at 



(neutral 
endopeptidase, 
enkephalinase, 
CALLA, CD10) 
76 204445_s_at arachidonate 5- 



77 209705 at 



78 214366 

79 215000. 



lipoxygenase 

likely orfholog of M96 

mouse metal 



ALOX5 



3q25.1- 
q25.2 



10qll.2 
Ip22.1 



NM_007287.1 31.3 Below 



AI361850 31.3 Below 
AF073293.1 31.3 Below 



82 238712 

83 229686. 



84 222620. 

85 224516 



transcription 

factor 2 
s at arachidonate 5- 

lipoxygenase 
s_at fasciculation and FEZ2 

elongation protein 

zeta 2 (zygin II) 
s at Fas apoptotic FATM 

inhibitory 

molecule 
at Homo sapiens 

gastric cancer- 
related protein 

GCYS-20 (gcys- 

20) mRNA, 

complete cds; 

homology with 

mouse epidermal 

growth factor 

receptor pathway 

substrate 8 
at ESTs 

at cDNA FLJ35637 FLJ35637 



10qll.2 AA995910 
2p21 AL1 17593.1 



31.3 
31.3 



BF801735 
AI436587 



31.3 
31.0 



Below 
Above 



NM_018147.1 31.3 Above 



AW575754 31.3 Above 



Above 
Below 



s_at hypothetical DNAJL1 10pll.23 BF591419 29.8 Above 

protein similar to 
mouse Dnajll 

s_at hypothetical HSPC195 5q31.3 BC006428.1 29.8 Above 

protein HSPC195 

-156- 



1.8 

54.8 



WO 03/083140 

86 203217_s_at 



PCT/LS03/08486 



87 204030_s_at 



209191_at 
213541 s_at 



90 213773_x_at 

91 219243_at 

92 219256_s_at 

93 223358_s_at 

94 224796_at 

95 203076_s_at 

96 212385_at 

97 216026_s_at 

98 217118_s_at 

99 219821_s_at 

100 201875_s_at 



sialyltransferase 9 SIAT9 
(CMP- 

NeuAc:lactosylcer 
amide alpha-2,3- 
sialyltransferase; 
GM3 synthase) 

SCHIP1 



TUBB-5 



region 20A 

immunity HIMAP4 

associated protein 

4 

hypothetical FLJ20356 
protein FLJ20356 
phosphodiesterase PDE7A 
7A 

development and DDEF1 



1 

tubulin beta-5 
v-ets ERG 
erythroblastosis 
virus E26 
oncogene like 
(avian) 

Williams Beuren WBSCR2 
syndrome OA 



q24.2 



enhancing factor 1 

MAD, mothers MADH2 



decapentaplegic 
homolog 2 
(Drosophila) 

cDNA FLJ1 1918 FLJ11918 
fis 

polymerase (DNA POLE 
directed), epsilon 
KIAA0930 KIAA093 
protein 0 
hypothetical FLJ20330 
protein FLJ20330 
hypothetical FLJ21047 
protein FLJ2 1047 



6pter- 
p22.1 



NM 003896.1 


28.8 


Below 


2.1 


NM 014575.1 


28.8 




17.6 


BC002654.1 


28.8 


Above 


6.4 


AI351043 


28.8 


Below 


2.8 


AW248552 


28.8 


Above 


1.3 


NM_018326.1 


28.8 


Below 


13.4 


NM_018986.1 


28.8 


Below 


2.6 


AW269834 


28.8 


Above 


1.5 


W03103 


28.8 


Below 


1.8 


U65019.1 


28.7 


Below 


2.0 


AK021980.1 


28.7 


Below 


3.2 


AL080203.1 


28.7 


Below 


3.0 


AK025608.1 


28.7 


Above 


1.9 


NMJH8988.1 


28.7 


Below 


5.5 


NM_024569.1 


28.5 


Above 


2.0 



Table 67. Top 100 chi-square probe sets selected for T-ALL 



T-ALL 

Chromo- above/ 

TJ133 probe somal Chi- below Fold 

set Gene Description Symbol Location GenBankRef square mean change 

201137_s_at major fflX 6p2L3 NM_002121.1 100.0 Below 21.0 

histocompatibility DPB1 
complex, class II, 
DP beta 1 

202113_s_at sorting nexin 2 SNX2 5q23 AF043453.1 100.0 Below 4.2 
-157- 



WO 03/083140 

3 202114 at sorting nexin 2 SNX2 

4 203675_at nucleobindin 2 NUCB2 

5 204670_x_at major HLA- 

histocompatibility DRB3 
complex, class II, 
DRbeta3 

6 205297_s_at CD79B antigen CD79B 

(immunoglobulin- 
associated beta) 

7 205456_at CD3E antigen, CD3E 

epsilon 

polypeptide (TiT3 
complex) 

8 206398_s_at CD19 antigen CD19 

9 208306_x_at major HLA- 

histocompatibility DRB4 
complex, class II, 
DRbeta4 

10 208894_at major HLA- 

histocompatibility DRA 
complex, class II, 
DR alpha 

11 209312_x_at major HLA- 

histocompatibility DRB1 
complex, class II, 
DR beta 1 

12 209619_at CD74 antigen CD74 

(invariant 
polypeptide of 
major 

histocompatibility 
complex, class II 
antigen- 
associated) 

13 210116_at SH2 domain SH2D1A 

protein 1A 
Duncan's disease 
(lymphoproliferati 
ve syndrome) 

14 210982_s_at major HLA- 

histocompatibility DRA 
complex, class n, 
DR alplia 

15 211990_at major HLA- 

histocompatibility DPA1 
complex, class II, 
DP alpha 1 

16 211991__s_at major HLA- 

histocompatibility DPA1 
complex, class II, 
DP alpha 1 

17 213539_at CD3D antigen, CD3D 

delta polypeptide 
(TiT3 complex) 

18 214049_x_at CD7 antigen (p4 1) CD7 

19 214551_s_at CD7 antigen (p41) CD7 



PCT/LS03/08486 

5q23 NM_003100.1 100.0 Below 4.6 

llpl5.1- NM_005013.1 100.0 Above 3.6 
pl4 

6p213 NM_002125.1 100.0 Below 13.4 

17q23 NM_000626.1 100.0 Below 23.3 



llq23 NM_000733.1 100.0 Above 20.7 



16pll.2 NM_001770.1 100.0 Below 5693.6 
6p21.3 NM_021983.2 100.0 Below 8.3 



6p21.3 M60334.1 100.0 Below 20.9 



6p21.3 U65585.1 100.0 Below 12.6 



5q32 K01 144.1 100.0 Below 15.1 



Xq25- AF072930.1 100.0 Above 150.7 
q26 



6p21.3 M60333.1 100.0 Below 23.4 



6p21.3 M27487.1 100.0 Below 19.6 



6p21.3 M27487.1 100.0 Below 24.5 



llq23 NM_000732.1 100.0 Above 35.7 



17q25.2- AI829961 100.0 Above 312.2 
q25.3 

17q25.2- NM_006137.2 100.0 Above 228.1 
q25.3 

-158- 



WO 03/083140 



PCT/LS03/08486 



20 


217147_s_at 


T-cell receptor 


TRIM 






interacting 
molecule 




21 


217478_s_at 


MHC, class Ila, 


HLA- 






HLA-DMA 


DMA 


22 


221969_at 


paired box gene 5 


PAX5 






(B-cell lineage 








specific activator 








protein) 




23 


227646_at 


early B-cell factor 


EBF 


24 


229487_at 


cDNA FLJ393 89 


FLJ39389 






fis 




25 


229838_at 


cDNA FLJ39156 


FLJ39156 






fis 




26 


232204 at 


early B-cell factor 


EBF 


27 


203965_at 


ubiquitin specific 


USP20 






protease 20 




28 


20489 l_s_at 


lymphocyte- 


LCK 






specific protein 








tyrosine kinase 




29 


205255_x_at 


transcription 


TCF7 






factor 7 (T-cell 








specific, HMG- 








box) 




30 


207655_s_at 


B-cell linker 


BLNK 


31 


20977 l_x_at 


CD24 antigen 


CD24. 






(small cell lung 








carcinoma cluster 




32 


211796_s_at 


4 antigen) 
T cell receptor 


TRB 






beta locus 




33 


213792_s_at 


insulin receptor 


INSR 


34 


215193 x at 


major 


HLA- 






histocompatibility 


DRB3 






complex, class II, 








DRbeta3 




35 


216379_x_at 


KIAA1919 


KIAA191 






protein 


9 


36 


219191^s_at 


bridging integratoi 


BIN2 


37 


219563_at 


2 

hypothetical 


FLJ21276 






protein FLJ21276 




38 


219724_s_at 


KIAA0748 gene 


KIAA074 






product 


8 


39 


221750_at 


3-hydroxy-3- 


HMGCS1 






methylglutaryl- 








Coenzyme A 








synthase 1 








(soluble) 




40 


226157_at 


cDNA FLJ39131 


FLJ39131 










41 


226496_at 


hypothetical 


FLJ22611 






protein FLJ226 11 




42 


266_s_at 


CD24 antigen 


CD24 






(small cell lung 








carcinoma cluster 








4 antigen) 





3ql3 


AJ240085.1 


100.0 


Above 


42.6 




X76775 


100.0 


Below 


11.9 


9pl3 


BF5 10692 


100.0 


Below 


3922.0 


5q34 
5 


BG435302 
W73890 


100.0 
100.0 


Below 
Below 


85.0 
7685.7 




A1j / /Z / 1 






12.7 


5q34 
9q34.12- 
q34.13 
lp34.3 


AF208502.1 
NM_006676.1 

NM_005356.1 


100.0 
91.3 

91.3 


Below 
Above 

Above 


7129.1 
9.0 

13.8 


5q31.1 


NM_003202.1 


91.3 


Above 


8.4 


10q23.2- 

q23.33 

6q21 


NM_013314.1 
AA761181 


91.3 
91.3 


Below 
Below 


103.2 
40.1 


7q34 


AF043 179.1 


91.3 


Above 


20.7 


19pl3.3 7 

pl3.2 

6p21.3 


AA485908 


91.3 


Below 


8.0 


AJ297586.1 


91.3 


Below 


12.1 


6q22.1 


AK000 168.1 


91.3 


Below 


44.0 


12ql3 


NM_0 16293.1 


91.3 


Above 


271.0 




NM 024633.1 


91.3 




5.8 


12ql2 


NM 014796.1 


91.3 


Above 


11.6 


5pl4-pl3 


BG035985 


91.3 


Above 


3.4 


3 


AI569747 


91.3 


Above 


4.4 


9pll.l 


BG291039 


91.3 


Below 


7.6 


6q21 


L33930 


91.3 


Below 


69.7 



159- 



WO 03/083140 



PCT/LS03/08486 



44 204214_s_at 



45 204777_s_at 



46 204890_s_at 



T-cell 

leukemia/lympho 
malA 

RAB32, member 
RAS oncogene 
family 
mal, T-cell 
differentiation 
protein 
lymphocyte- 
specific protein 



47 205049_s_at 



48 205254_x_at 



50 210915_x_at 

51 211211_x_at 

52 213830_at 

53 216191_s__at 

54 217143_s_at 

55 219528_s_at 



57 222895_s_at 



58 223553_s_at 



59 225090_at 

60 226459_at 



61 228314_at 



CD79A antigen 
(immunoglobulin- 
associated alpha) 
transcription 
factor 7 (T-cell 
specific, HMG- 
box) 
Bruton 

agammaglobuline 
mia tyrosine 
kinase 

T cell receptor 
beta locus 
SH2 domain 
protein 1A, 
Duncan's disease 
(lymphoproliferati 
ve syndrome) 
T cell receptor 
delta locus 
T cell receptor 
delta locus 
T cell receptor 
delta locus 
B-cell 

CLL/lymphoma 
1 IB (zinc finger 
protein) 
ubiquitin 
associated and 
SH3 domain 
containing, A 
B-cell 

CLL/lymphoma 
1 IB (zinc finger 
protein) 
hypothetical 
protein FLJ22570 
HRD1 protein 
Homo sapiens 
gastric cancer- 
related protein 
GCYS-20 (gcys- 
20) mRNA, 
cds 



cDNA FLJ37485 
fis 

-160- 



TCL1A 


14q32.1 


X82240 


91.3 


Below 


367.4 


RAB32 


6q24.3 


NM_006834.1 


90.6 


Above 


127.9 


MAL 


2cen-ql3 


NM_002371.2 


90.6 


Above 


96.8 


LCK 


lp34.3 


U07236.1 


90.6 


Above 


18.6 


CD79A 


19ql3.2 


NM_001783.1 


90.6 


Below 


11.4 






AW027359 


90.6 




352.0 


BTK 


Xq21.33- 
q22 


NM_000061.1 


90.6 


Below 


6.6 


TRB 


7q34 


M15564.1 


90.6 


Above 


15.9 


SH2D1A 


Xq25- 
q26 


AF100542.1 


90.6 


Above 


1963.5 


TRD 


14qll.2 


AW007751 


90.6 


Above 


7411.2 






X72501.1 


90.6 




253.7 


TRD 


14qll.2 


X06557.1 


90.6 


Above 


151.9 




^2.32 


NM 022898.1 


90.6 




11.6 


UBASH3 
A 




NM 018961.1 


90.6 




759.3 


BCL11B 


14q32.31 
-q32.32 


AA918317 


90.6 


Above 


11.7 


FLJ22570 


5q35.3 


BC004564.1 


90.6 




6.1 


HRD1 


llql2 


AA844682 
AW575754 


90.6 
90.6 


Below 
Below 


3.6 
10.7 


FLJ37485 




BE877357 


90.6 


Below 


4.7 



WO 03/083140 

62 201384_s_at 



63 202540_s_at 

64 203198_at 

65 203932_at 

66 204613_at 



PCT/LS03/08486 



membrane M17S2 
component, 
chromosome 17, 
surface marker 2 
(ovarian 

carcinoma antigen 
CA125) 

3-hydroxy-3- HMGCR 
methylglutaryl- 
Coenzyme A 
reductase 

cyclin-dependent CDK9 
kinase 9 (CDC2- 
related kinase) 
major HLA- 
histocompatibility DMB 
complex, class n, 
DM beta 

phospholipase C, PLCG2 



17q21.1 NM_005899.1 



5ql3.3- 
ql4 



9q34.1 
6 P 21.3 



NM 000859.1 

NM_001261.1 
NM_002118.1 



16q24.1 NM_002661.1 



68 208650_s_at 

69 20865 l_x_at 

70 209995_s_at 

71 210038_at 

72 211126_s_at 

73 220068_at 

74 226245_at 

75 202615_at 

76 224861_at 

77 201194_at 

78 201349_at 



(phosphatidylinosi 
tol-specific) 

POU domain, POU2AF1 llq23.1 NM_006235.1 
class 2, 

associating factor 

1 

CD24 antigen CD24 6q21 BG327863 
(small cell lung 
carcinoma cluster 



CD24 antigen 
(small cell lung 
carcinoma cluster 



CSRP2 



T-cell 

leukemia/lymphi 
ma 1A 

protein kinase C, PRKCQ 
theta 

cysteine and 
glycine-rich 
protein 2 

pre-B lymphocyte VPREB3 
gene 3 

cDNA DKFZp45 
DKFZp451C132 1C132 
cDNA DKFZp68 
DKFZp686D0521 6D0521 
CDNAFLJ31057 FLJ31057 
fis 

selenoprotein W, SEPW1 
1 

SLC9A3R 



TCL1A 14q32.1 BC003574.1 



10pl5 
12q21.1 



19ql3.3 
17q25.2 



solute cai 
family 9 
(sodium/hydrogen 
exchanger), 
isoform 3 
regulatory factor 1 
79 202539_s_at 3-hydroxy-3- HMGCR 5ql33 

-161- 



AL137145 
U46006.1 

NMJH3378.1 

U55984 

BF222895 

BF477658 

NM_003009.1 

NM 004252.1 



83.8 


Above 


3.3 


83.8 


Above 


4.4 


83.8 


Below 


4.8 


83.8 


Below 


7.9 


83.8 


Below 


3.9 


83.8 


Below 


11.2 


83.8 


Below 


74.7 


83.8 


Below 


52.7 






2 


83.8 


Above 


12.7 


83.8 


Below 


18.0 


83.8 


Below 


6559.8 


83.8 


Above 


8.7 


82.2 


Above 


3.1 


82.2 


Above 


3.5 


82.0 


Above 


3.8 


82.0 




2.9 


82.0 


Above 


3.5 



WO 03/083140 



PCT/LS03/08486 



methylglutaryl- ql4 

Coenzyme A 

reductase 

S_s_at transcription TFDP2 3q23 BG034328 82.0 Above 
factor Dp-2(E2F 



partner 2) 

81 204852_s_at protein tyrosine PTPN7 lq32.1 NM 002832.1 82.0 Above 9.5 

phosphatase, non- 
receptor type 7 

82 207434_s_at FXYD domain FXYD2 llq23 NM_021603.1 82.0 Above 14.6 

containing ion 
transport regulator 
2 

83 208872_s_at DNA segment, D5S346 5q22-q23 AA814140 82.0 Below 2.6 

single copy probe 

LNS-CAI/LNS- 

CAII 

84 209200_at MADS box MEF2C 5ql4 N22468 82.0 Below 7.5 

transcription 
enhancer factor 2, 
polypeptide C 
(myocyte 
enhancer factor 
2C) 

85 212795_at KIAA1033 KIAA103 12q24.11 AL137753.1 82.0 Below 2.4 

protein 3 

86 212827_at immunoglobulin IGHM 14q32.33 X17115.1 82.0 Below 13.1 

heavy constant mu 

87 213193_x_at T cell receptor TRB 7q34 AL559122 82.0 Above 10.9 

beta locus 

88 221002_s_at tetraspanin similar DC- 10q23.2 NM_030927.1 82.0 Below 2.1 

to TM4SF9 TM4F2 

89 225314_at hypothetical MGC4541 4pl2 BG291649 82.0 Above 5.5 

protein 6 
MGC45416 

90 227432_s_at insulin receptor INSR 19pl3.3- A1215106 82.0 Below 6.0 

P 13.2 

91 203332_s_at inositol INPP5D 2q36-q37 NM_005541.1 81.5 Below 2.2 

polyphosphate-5- 



145kDa 

92 203589_s_at transcription TFDP2 3q23 NM_006286.1 81.5 Above 35.1 

factor Dp-2 (E2F 
dimerization 
partner 2) 

93 205674_x_at FXYD domain FXYD2 llq23 NM_001680.2 81.5 Above 12.2 

containing ion 



94 209881_s_at Linker for LAT 16ql3 AF036905.1 81.5 Above 1823.4 

activation of T 
cells 

95 211005_at Linker for LAT 16ql3 AF036906.1 81.5 Above 67.8 

activation of T 
cells 

96 211075_s_at CD47 CD47 Z25521.1 81.5 Above 2.1 

97 211210_x_at SH2 domain SH2D1A Xq25- AF100539.1 81.5 Above 300.2 

protein 1A, q26 



WO 03/083140 



PCT/LS03/08486 



(lymphoproliferati 
ve syndrome) 

98 213601_at slit homolog 1 SL1T1 

(Drosophila) 

99 213857_s_at CD47 antigen CD47 

(Rh-related 
antigen, integrin- 
associated signal 
transducer) 

100 214924__s_at KIAA1042 KIAA104 3p25.3- 

protein 2 p24.1 



10q23.3- AB01 1537.2 
q24 

3ql3.1- BG230614 
ql3.2 



Above 1752.1 
Above 2.2 



AK000754.1 81.5 Below 



Table 68. Top 100 chi-square probe sets selected for TEL-AML1 



U133 probe Gene 

Description 



Chromo- Cbl- 
somal square 
Symbol Location GenBankRef value 



TEL- 
AML 
above/ 

below Fold 
mean change 



1 224722_at KIAA1323 



2 227377_at FLJ12722 

3 237206_at EST 

4 241505_at EST 

5 2031S4_at Fibrillin 2 

(congenital 
conrractural 
araclrnodactyly) 

6 205109_s_at Rho guanine 

nucleotide 
exchange factor 
(GEF) 4 

7 210650_s_at Piccolo 

8 213558_at Piccolo 

9 22045 l_s_at LivinlAP 

(inhibitor of 
apoptosis) 

10 224720_at KIAA1323 



11 235694_at MAGE:4661943 

Unknown EST 

12 202808_at Hypothetical 

protein FLJ20154 

13 206032_at Desmocollin3 

14 206033_s_at Desmocollin3 

15 209228_x_at Putative prostate 

cancer tumor i 
suppressor gene 
N33 

16 224725 at KIAA1323 



17 203910_at PTPLl-associated 

RhoGAP 

18 204849_at Transcription 



18qll.l W80418 75 Above 7.6 
KIAA132 
3 

FLJ12722 17q21.32 AK022784.1 

17pl2 AI452798 

BF513468 75 Above 13.4 

FBN2 5q23.2 NM_001999.2 69.1 Above 14.4 



Above 2446.3 
Above 23.7 



2q22 



NM_015320.1 69.1 Above 148.1 



PCLO 
PCLO 
BIRC7 


7q21.11 
7q21.11 
20ql3.3 


BC00 1304.1 
AB011131.1 
NM_022161.1 


69.1 
69.1 
69.1 


Above 
Above 


101.2 

77.5 
25.4 


KIAA132 


lSqll.l 


W80418 


69.1 


Above 


4.3 


3 


20ql3.33 


N49233 


69.1 


Above 


9.3 


FLJ20154 

DSC3 
DSC3 
N33 


10q24.32 
18ql2.1 
18ql2.1 
8p22 


AK000161.1 

AI797281 
NM 001941.2 
U42349.1 


68.9 

68.9 
68.9 
68.9 


Above 

Above 
Above 
Above 


3.7 

54.1 
357.1 
20.8 


KIAA132 


18qll.l 


W80418 


68.9 


Above 


3.6 


3 

PARG1 


lp22.1 


NM 004815.1 


64 


Above 


7.1 


TCFL5 


20ql3.33 


NM 006602.1 


64 


Above 


8.9 



WO 03/083140 



PCT/LS03/08486 



factor-like 5 
(helix-loop-helix 



19 20623 l_at 



20 20805 6_s_at 



21 211222_s_at 



22 223468_s_at 



23 227266_s_at 

24 228158_at 

25 37986_at 

26 203464_s_at 

27 213317_at 

28 213423_x_at 



29 226817_at 

30 227862_at 

31 229339_at 

32 211795_s_at 



Potassium 
intermediate/ small 
conductance 
calcium-activated 
channel, 
subfamily N, 
member 1 
Core-binding 
factor, runt 
domain, alpha 
subunit 2; 
translocated to, 3 
Huntingtin- 
associated protein 
1 (neuroan 1, 
HAP-1) 
hypothetical 
protein from 
EUROIMAGE 
363668 RGM: 
likely ortholog of 
chicken repulsive 
guidance molecule 
FYN-binding 
protein 
Lymphocyte- 
specific protein 1 
EPO receptor 
Epsin 2 
chloride 
intracellular 
channel 5 
Putative prostate 
cancer tumor 
suppressor 
Desmocollin2 
ESTs 
EST 

FYN binding 



33 218627_at Hypothetical 

protein FLJ 11259 

34 221748_s_at Homo sapiens 

cDNA FLJ32766 



35 200709_at 

36 204615_x_at 

37 208881_x_at 

38 213301_x_at 



FK506 binding 
protein 1A (12kD) 
Isopentenyl- 
diphosphate delta 
isomerase 
Isopentenyl- 
diphosphate delta 



Transcriptional 
intermediary 



KCNN1 


19 P 13.1 


NM_002248.2 


64 


Above 


72.7 




16q24 


NM 005 187.2 


63 


Above 


2.5 


CBFA2T3 












HAP1 


17q21.2 


AF040723.1 


63 


Above 


80.8 


RGM 


15q26.1 


AL136826.1 


63 


Above 


10.6 


FYB 


5pl3.1 


BF679849 


63 


Above 


3.1 




2pll.l 


AI623211 


63 


Above 


7.9 


EPOR 


19pl3.2 


M60459 




Ah 


15 5 


EPN2 






62 9 


Above 


43.3 


CLIC5 


6p21.1 


AL049313.1 


62.9 


Above 


99.3 


N33 


8p22 


AI884858 


62.9 


Above 


15.7 


DSC2 


18ql2.1 


AU154691 


62.9 


Above 


48.3 




lp35.1 


AA037766 


62.9 


Above 


14.7 




17pl2 


AI093327 


62.9 


Above 


31.1 


FYB 


5pl3.1 


AF 198052.1 


59.4 


Above 


4.1 


FLJ11259 


12q23.1 


NMJH8370.1 


57.9 


Above 


4.6 


TNS 


2q35 


AL046979 








FKBP1A 


20pl3 


NM.000801.1 


57.1 


Above 


1.8 


IDI1 


10pl5.3 


NM_004508.1 


57.1 




2.6 


IDI1 


10pl5.3 


BC005247.1 


57.1 




2.6 


TIF1 


7q34 


AL538264 


57.1 


Above 


2.0 



WO 03/083140 



PCT/LS03/08486 



39 221747_at 

40 224726_at 



Tensin 
KIAA1323 



2q35 AL046979 
lSqll.l W80418 



57.1 
57.1 



Above 
Above 



49.2 
26.1 



41 


231455_at 


ESTs 




2p25.2 


AA768888 


57.1 


Above 


7.7 


42 


232750_at 


Homo sapiens 


FLJ13750 2q35 


AU1 58570 


57.1 


Above 


35.0 






cDNA FLJ 13750 










Above 


1.9 


43 


209685_s_at 


Protein kinase C, 

EST like 
Na+/K+/Cl- 
transporter with 
AA permease 


PRKCB1 


16pll.2 


M13975.1 


53.6 


44 


204404 at 


SLC12A2 


5q23.3 


NM 001046.1 


53.4 


Above 


2.0 
















Above 


9.0 


45 


239673_at 


ESTs 




4q31.23 


AW080999 


53.4 


46 


240950_s_at 




FLJ32658 19ql3.33 


AA400740 


53.4 


Above 


9.9 






CJJJNA r LJ 510 05 












4.5 


47 


204297_at 


Phosphoinositide- 


PIK3C3 


18ql2.3 


NM_002647.1 


52.5 


Above 






3-kinase, class 3 










Above 


5.4 


48 


20659 l_at 


Recombination 


RAG1 


Upl3 


NM 000448.1 


52.1 






activating gene 1 










Above 


17.0 


49 


209962_at 


Erythropoietin 


EPOR 


19pl3.2 


M34986.1 


52.1 
















Above 


7.6 


50 


209963_s_at 


Erythropoietin 


EPOR 


19pl3.2 


M34986.1 


52.1 
















Above 


1.8 


51 


210186_s_at 


FK506 bmdrng 


FKBP1A 


20pl3 


BC005 147.1 


52.1 






protein 1 A (12kD) 










Above 


60.3 


52 


219866_at 


Chloride 
intracellular 


CLIC5 


6p21.1 


NM_016929.1 


52.1 






channel 5 










Below 


2.8 


53 


203474_at 


IQ motif 
containing 
GTPase activating 


IQGAP2 


5ql3.2 


NM_006633.1 


51.6 


54 


210058_at 


protem 2 

Mitogen-activated 


MAPK13 


6p21.1 


BC000433.1 


51.6 


Above 


2.3 






protem kinase 13 












452.6 


55 


211891_s_at 


Rho guanine 
nucleotide 


ARHGEF 
4 


2q22 


AB042 199.1 


51.6 






exchange factor 
(GEF) 4 












2.0 


56 


214214_s_at 


Complement 
component 1, q 
subcomponent 


C1QBP 


17pl3.3 


AU151801 


51.6 


Below 






binding protein 










Above 


1.7 


57 


218152_at 


High-mobility 


HMG20A 


15q24 


NM 018200.1 


51.6 






group 20A 










Above 


2.4 


58 


234983_at 


s 


FLJ21415 12q24.22 


BE893995 


51.6 


59 


240446_at 


KIAA1323 


KIAA132 


lSqll.2 


AI798164 


51.6 


Above 


102.2 


60 


244107 at 


ESTs 


3 


18ql2.1 


AW1 89097 


51.6 


Above 


518.9 


61 


205794_s_at 


Neuro-oncological 


NOVA1 


14ql2 


NM_002515.1 


51.4 


Above 


40.4 






ventral antigen 1 












87.4 


62 


217628_at 


chloride 
intracellular 


CLIC5 


6p21.1 


BF032808 


51.4 


Above 






channel 5 












41.6 


63 


218804__at 


Hypothetical 


FLJ10261 


llql3.3 


NM_018043.1 


51.4 


Above 






protein FLJ10261 










Above 


8.7 


64 


230698_at 


EST 




7qll.22 


AW072102 


51.4 



-165- 



WO 03/083140 



PCT/LS03/08486 



65 225129_at 

66 201266_at 

67 20361 l_at 

68 213017_at 

69 236430_at 

70 209035_at 



71 209193_at 

72 218625_at 

73 22603 8_at 

74 232227_at 

75 204160_s_at 



77 218813_s_at 

78 227111_at 

79 202382_s_at 

80 202838_at 

81 22573 l_at 



82 225835_at 

83 229790_at 

84 230069_at 

85 235872_at 

86 239300_at 

87 241940_at 

88 203370_s_at 



Thioredoxin 
reductase 1 
Telomeric repeat 
binding factor 2 
Lung alpha/beta 
hydrolase 3 
hypothetical 
protein 
MGC23911 
Midkine (neurite 
growth-promoting 
factor 2). 
Pim-1 oncogene 
Neuritin 1 
Hypothetical 
protein FLJ23749 
EST 

Ectonucleotide 

pyrophosphatase/p 

hosphodiesterase 

4 (putative 

function) 

UDP- 

GakbetaGlcNAc 
beta 1,4- 

galactosyltransfera 

se, polypeptide 6 

SH3-domain 

GRB2-like 

endophilin B2 

Homo sapiens 

CDNAFLJ31099 

fis, clone 

IMR321000230 

Glucosamine-6- 

phosphate 



Fucosidase, alpha- 

L- 1, tissue 

Hypothetical 

protein 

KIAA1223 

FLJ21409 

Telomeric repeat 

binding factor 2 

Hypothetical 

protein FLJ 12876 

ESTs 

EST 

EST 

Enigma (LIM 
domain protein) 
LOC149153: 



89 215149_at 

90 217901_at Desmoglein2 



FLJ37548 


16ql3 


AW170571 


49.4 


Above 


3.0 


TXNRD1 
TERF2 


12q23- 
q24.1 
16q22.1 


NM 003330.1 
NM_005652.1 


48.2 
48.2 


Above 
Above 


1.7 
5.3 






AL534702 


48.2 




4.0 


MGC2391 


16q22.1 


AA708152 


48.2 


Above 


16.8 


1 

MDK 


llpll.2 


M69148.1 


47.7 


Above 


4.6 


PIM1 
FLJ23749 


6p21.2 
8p23.1 


M24779.1 
NM 016588.1 
BF680438 


47.7 
47.7 
47.7 


Above 
Above 


2.0 
5.1 

5.2 


ENPP4 


9q34.3 
6pl2.3 


AV736391 
AW194947 


47.7 
46.5 


Above 
Above 


14.7 
7.2 


B4GALT6 


18qll 


AF097159.1 


46.5 


Above 


2.6 


SH3GLB2 


9q34.11 


NM_020145.1 


46.5 


Above 


6.2 


FLJ31099 9q33 


BG179317 


46.5 


Above 


2.7 


GNPI 


5q21 


NM_005471.1 


46.2 


Above 


5.6 


FUCA1 


lp34 


NM_000147.1 


46.2 


Above 


4.8 


KIAA122 


4q26 


AB033049.1 


46.2 


Above 


2.8 


SLC12A2 5q23.2 
TERF2 16q22.1 


AK025062.1 
AW006832 


46.2 
46.2 


Above 
Above 


3.6 
7.4 


FLJ 12876 


5q35.3 


BF593817 


46.2 


Above 


9.4 


ENIGMA 


18ql2.3 
18qll.2 
5q35.3 


BE408975 46.2 
AI632214 46.2 
BF477544 46.2 
NM 005451.2 45.9 


Above 
Above 
Above 
Above 


17.7 
3.0 
2.9 
8.1 


LOC1491 

53 

DSG2 


lp36.32 
18ql2.1 


AF052109.1 
BF031829 


45.9 
45.9 


Above 
Above 


9.2 
6.7 



WO 03/083140 



PCT/LS03/08486 



cadherin 

91 235333_at UDP- 18ql2.1 BG503479 45.9 Above 2.0 

Gal:betaGlcNAc B4GALT6 
beta 1,4- 

galacto syltransfera 
se, polypeptide 6 

92 242881_x_at EST BG285837 45.9 Above 11.8 

93 200783_s_at Stathmin STMN1 lp35.1 NM 005563.2 45.8 Above 1.5 

1/oncoprotein 18 
leukemia- 



phosphoprotein 

94 201334_s_at Rho guanine Hq23.3 NM_015313.1 45.8 Above 6.1 

nucleotide ARHGEF 

exchange factor 12 
(GEF) 12 

95 203038_at Protein tyrosine PTPRK 6q22.33 NM_002844.1 45.8 Above 9.1 



receptor type, K 

96 209735_at ATP-binding ABCG2 4q22 AE098951.2 45.8 Above 4.5 

cassette, sub- 
family G 
(WHITE), 
member 2 

97 212063_at Unactive P23 12ql2 BE903880 45.8 Below 7.4 

progesterone 
receptor, 23 kD 

98 212399_s_at Hypothetical 3p25.2 D50911.2 45.8 Above 1.8 

protein KIAA012 
KIAA0121 1 

99 212438_at Putative nucleic RY1 2pl3.1 BG252325 45.2 Above 1.7 

acid binding 
protein RY-1 

100 214761_at OLF-l/early B- OAZ 16ql2 AW149417 45.2 Above 2.1 

cell factor 
associated zinc 
finger protein 



Biologic insights from the new class defining genes 

Interestingly, the overall quantitative pattern of expression of discriminating 
5 genes varied significantly between leukemia subtypes (Table 69). Within the B-cell 
lineage leukemia subtypes, E2A-PBX1, TEL-AML1 , BCR-ABL, and Hyperdiploid 
>50 chromosomes were characterized primarily by genes that were overexpressed, 
where as almost 40% of the discriminating genes that characterized MLL fusion gene 
expressing leukemias were underexpressed. More remarkably, the discriminating 
1 0 genes for the leukemia subtypes defined by chimeric transcription factors were 

markedly overexpressed, with an average fold increase of 1 12 and 48 for E2A-PBX1 
and TEL-AML1, respectively. By contrast, the discriminating genes for BCR-ABL 
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and MLL fusion gene expressing leukemias showed an average fold increases of only 
6.8. and 8.6, respectively, whereas the discriminating genes for hyperdiploid >50 
chromosomes had an average fold-increase of only 2.6 fold. These data suggest that 
the quantitative global changes in a cell's expression profile vary markedly depending 
5 on the genetic lesion(s) that underlie the initiation of the leukemic process. 



Table 69. Summary of fold change by diagnostic 
subgroup (by gene) 





Mean fold 




Subgroup 


change 


Range 


BCR-ABL 


6.8 


1.1-90.5 


E2A-PBX1 


112.0 


1.6-5435 


Hyperdiploid >50 


2.6 


1.3-27.2 


MLL rearrangement 


8.6 


1.0-75 


T-ALL 


387 


2.1 - 7685 


TEL-AML1 


48:3 


1.5-2446 



10 

Tables 70-74 show genes whose expression is limited to a single B-cell 
lineage class, and therefore function not only as class discriminators in the decision 
tree format, but are also class discriminators in a parallel format in which a class is 
distinguished against all others. Thus, these genes have the potential of serving as 

15 unique class specific diagnostic or therapeutic targets. In addition, these genes may 
provide unique insights into the underlying biology of the different leukemia 
subtypes. For example, BCR-ABL expressing ALLs are characterized by the over 
expression of Dynactin 4, which encodes a RING finger containing protein that is part 
of the 20S dynactin multisubunit complex involved in movement, intracellular 

20 transport and division through its interaction with the cytoplasmic microtubule-based 
motor dynein; PSTPIP2, which encodes a proline/serine/threonine phosphatase- 
interacting protein that is also involved in controlling the organization of the 
cytoskeleton, and is tyrosine phosphorylated following activation of receptor tyrosine 
kinases (Karki et al. (2000) J. Biol. C7/e»2.275:4834-4839); and several novel ESTs. 
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Table 70: Genes highly Correlated with BCR-ABL 


GenBank Reference 


Gene Description 


/VIMJUZUCHt 


DKFZP564A2416 hist one H5 signature 


BE218028 


Dynactin 4 


NM 024600 


FLJ20898 


NM_024430 


Pro-Ser-Thr phsphatase interac. protein 2 


AV648669 


FLJ39877 



E2A-PBX1 expressing leukemias are characterized by the expression of 
PBX1, the receptor tyrosine kinase gene C-MERTK, and the FAT tumor suppressor, 
which encodes a member of the cadherin repeat domain containing family of 
transmembrane proteins (see Table 64). Among the discriminating genes were two 
genes, EB-1 and Wntl6 that had previously been shown to be over expressed in this 
leukemia subtype (Wu et al. (1998) J. Biol. Chem. 273:30487-30496; and Fu et al. 
(1999) Oncogene 18:4920-4929). In addition, the retinal degeneration B beta gene 
(McWhirter et al. (1999) Proc. Natl. Acad. Sci. USA. 96:1 1464-1 1469), and a 
number of novel ESTs were identified as being uniquely over expressed in this 
leukemia subtype, whereas the SOCS2 negative regulators of cytokine signaling was 
found to be under expressed (Fullwood and Hsuan (1999) J Biol. Chem. 274:31553- 
31558). 26 



Table 71: Genes highly Correlated with E2A-PBX1 


GenBank Reference 


Gene Description 


NM012417 


retinal degeneration B beta 


AI971602 


MGC10485 


AW005572 


EB-1 


AL357503 


Q9H4T4 like 


NM 016087 


Wntl6 



Hyperdiploid leukemias with >50 chromosomes were characterized by the 
over expression of MST4, which encodes a novel serine/threonine kinase (Horvat and 
Medrano (2001) Genomics 72:209-212); SH3BP2, which encodes a SH3-domain 
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containing binding protein (Lin et al. (2001) Oncogene 20:6559-6569) histone 
deacetylase 6, which encodes a protein involved in transcriptional repression; the 
. retinoblastoma binding protein 7 gene, which encodes a protein found in many 
functional histone deacetylase complexes (Bell et al. (1997) Genomics 44:163-170), 
5 and TNRC 1 1 a trinucleotide repeat containing gene that is also known as HOP A or 
TRAP230 and is part of the thyroid hormone receptor-associated protein (TRAP) 
complex (Huang et al. (1991) Nature 350:160-162; and Ito et al. (1999) Mol Cell. 
3:361-370. 



Table 72: Genes highly Correlated with Hyperdiploid >50 


GenBank Reference 


Gene Description 


NM_002893 


Retinoblastoma binding protein 7 


AB000462 


SH3-domain binding protein 2 


NM 006044 


Histone deacetylase 6 


BC004354 


trinucleotide repeat containing 1 1 


NM_016542 


Mst3 and SOK1 -related kinase 



10 

Cases with MLL gene rearrangements were characterized by the over 
expression of HOXA9 and Meisl (see Table 66). Included in the up-regulated genes 
was a novel transcript from chromosome 20 that was over expressed almost 25 fold. 
This transcript is predicted to encode a protein of 280 amino acids that shows a low 

1 5 level of homology to a lysosome-associated membrane glycoprotein (LAMP). Also 
specifically over expressed in this leukemia subtype is a gene encoding an insulin 
growth factor (IGF) II RNA binding protein, that has been shown to repress the 
translation of the IGF-II growth factor (Armstrong et al. (2002). Nat. Genet. 30:41- 
47). Among the down regulated genes was neuron navigator 1 (Nielsen et al. (1999) 

20 Mol Cell Biol. 19:1262-1270), which encodes an 1874 amino acid protein and is 
involved in direction guidance of migratory cells, and a member of the TCF/LEF 
family of transcription factors, TCF-4. TCF-4 functions downstream of p-catenin in 
the Wnt-mediated signaling cascade and has been shown to be essential for the 
maintenance of intestinal crypt stem cells (Maes et al. (2002) Genomics 80:21-30). 

25 
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Table 73: Genes highly Correlated with MLL 


GenBank Reference 


Gene Description 


NM_012261 


C20orfl03 


AI202327 


T7T T1TJAH 

fLJj /Z4/ 


NM_006548 


IGF-II mRNA-binding protein 2 


NM_018401 


gene for serine/threonin protein kinase 


NM_018728 


myosin 5C 


AB032977 


neuron navigator 1 



Genes that were discriminators of TEL-AML1 leukemias included a gene 
localized to chromosome 18ql 1 . 1 that encodes a 795 amino acid protein that lias 8 
ankyrin repeat domains and a C-terminal RING finger domain. This combination of 
5 domains is identified in only a limited number of mammalian proteins, most notably 
BARD1, a regulator of the BRCA1 tumor suppressor (Korinek et al. (1998) Nat 
Genet.l9:379-383). Other genes overexpressed in the subtype include desmocollin 
(Irminger-Finger and Leung (2002) Int. J. Biochem. Cell Biol. 34:582-587), FLJ12722 
a novel protein of unknown function, and a member of the IAP family of apoptosis 
1 0 inhibitors, BIRC7, which is overexpressed 25 fold (Whittock et al. (2000) Biochem 
Biophys Res Commun. 276:454-460). 



Table 74: Genes highly Correlated with TEL-AML1 


GenBank Reference 


Gene Description 


W80418 


KIAA1323 


AK022784 


FLJ12722 


NM_022161 


BIRC7 


AI452798 


FLJ39434 


AI797281 


Desmocollin 3 
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Expression profiling accurately identifies the prognostic subtypes of ALL 

To assess the accuracy of identifying prognostically important ALL genetic 
subtypes by expression profiling, the class o^scriminating genes identified using a chi- 
squared metric were used in an ANN-based supervised learning algorithm. Class 
5 assignment utilized the decision tree differential diagnostic format described 
elsewhere herein, and required that the node value for assignment exceeded a 
statistically defined confidence level. Using this approach resulted in exceptionally 
accurate class prediction in a randomly selected training set that consisted of three- 
fourths of the total cases (100 cases). When this classification model was then applied 

10 to a blinded test set consisting of the remaining 32 samples, an overall accuracy of 
97% was achieved for class assignment. To control for over-fitting of the data, 10 
additional rounds of this analysis were performined in which for each round new 
training and test sets were developed, genes reselected using the new training set, and 
then their performance assessed on the new test set. This resulted in an average 

1 5 accuracy of class assignment in the blinded test sets of 97.2%, with a range from 

93.8% to 100%. Although the number of genes required for optimal class assignment 
varied between classes, the best overall diagnostic accuracy was achieved using the 
top 50 genes per class. A similar level of accuracy was achieved using a variety of 
other supervised learning algorithms, including k-NN and SVM. 

20 Interestingly, of the rare misclassification errors, two were cases of BCR-ABL 

expressing ALL that by gene expression analysis was classified as hyperdiploid >50 
chromosomes. The karyotype of these cases showed the presence of both the 
Philadelphia chromosome and a hyperdiploid karyotype consisting of >50 
chromosomes - including trisomy of chromosomes X and 21 (data not shown). The 

25 expression profile thus correctly identified the presence of the hyperdiploid >50 

chromosomes class; however, since each case is assigned to only a single class, the 
algorithm failed to correctly identify the presence of BCR-ABL. Nevertheless, the 
data presented demonstrates the exceptional accuracy of this single platform for the 
diagnosis of the prognostically important subtypes of ALL. 

30 
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Overview of Experimental Procedure 

A. Gene expression profiling 

The preparation of mononuclear cell suspensions from diagnostic bone 
marrow aspirates, extraction of total RNA, and preparation of hybridization solutions 
was performed as described for Example 1. Individual hybridization solutions from 
our previous study had been stored at -80°C since initial hybridization (approximately 
1 year). These solutions were thawed and hybridized to Affymetrix® HG-U133A and 
HG-U133B oligonucleotide microarrays (Affymetrix Inc., Santa Clara, CA) according 
to Affymetrix protocols. In two cases where the original hybridization solutions were 
no longer available, replicate viably frozen mononuclear cell preparations from the 
diagnostic bone marrow aspirate were obtained, RNA isolated, cDNA and cRNA 
synthesized, labeled, fragmented and hybridized as described for Example 1. 

After sample hybridization, arrays were then stained with phycoerythrin- 
conjugated streptavidin (Molecular Probes, Eugene, OR). Antibody amplification was 
performed with biotinylated anti-streptavidin (Vector Laboratories, Burlingame, CA), 
followed by staining with phycoerythrin-conjugated streptavidin (Molecular Probes). 
Arrays were scanned using a laser confocal scanner (Agilent, Palo Alto, CA) and then 
analyzed with Affymetrix® Microarray suite 5.0 (MAS 5.0). Detection values 
(present, marginal or absent) were determined by default parameters, and signal 
values were scaled by global methods to a target value of 500. Microarray scan 
images were visually inspected for apparent defects, and Affymetrix internal controls 
were utilized to monitor the success of hybridization, washing, and staining 
procedures. Minimal quality control parameters for inclusion in the study included 
greater than 10% present calls and a GAPDH 375' ratio of < 3. The arrays included in 
this study had an average % present call of 35.9% for the A chip and 21 .0% for the B 
chip (combined average of 28.5%). 

B. Statistical Analysis 

The dataset was separated into a train set (100) and test set (32). The 
identification of subtype discriminating genes was performed using the training set. 
Moreover, both gene discovery and subsequent class predictions were performed 
using a differential diagnosis decision tree format, hi this format, classification was 
performed in a sequential order starting with T-ALL and proceeding in order E2A- 
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PBX1, TEL-AML1, BCR-ABL, MLL rearrangement, and Hyperdiploid >50 
chromosomes. Unassigned cases were classified as other. Samples classified into the 
class under diagnosis were removed prior to proceeding to the next level in the 
decision tree, hi addition, prior to analysis a variation filter was applied to remove any 
probe set that showed minimal variation across the dataset, and thus contributed 
minimally, if at all, to the discrimination of leukemia subtypes. Specifically, probe 
sets were eliminated from further analysis if the number of cases with a present call 
was less than V% the number of samples comprising the leukemia subgroup under 
analysis, had a signal value < 100 in all samples in the dataset, or had a maximal 
signal value hi the dataset - minimal signal value in the dataset that was less than 100. 
In addition, all signal values with absent or marginal calls were reset to 1, while probe 
sets with a present "P" call and a signal <100 had the signal reset to 100. The values 
for signals from the Affymetrix® control sets were removed prior to analysis. 

Unsupervised hierarchical clustering and principal component analysis (PCA) 
were performed using GeneMaths software (version 1 .5, Applied Maths, Belgium). 
Data reduction to define the genes most useful in class distinction was primarily 
performed using a chi-square metric. In this procedure, an entropy-based 
discretization method was first applied to identify genes whose expression across the 
dataset showed differentiation between class and non-class. 17 The assigned 
descretized value for the gene was then used in a chi-square calculation to determine 
if the association with a class was more than would be expected by random chance. 
The stronger the association with the class, the larger the chi-square value calculated. 
For the genes that couldn't be discretized, their chi-squared values were set to zero. 
To evaluate the statistical significance of the discriminating genes, we used a 
permutation test in which for each class, case labels were randomly reassigned to 
generate new groups of identical size. The label permutated data was discretized again 
and the chi-square values were recalculated. The permutation test was repeated for a 
total of 1000 times. The true chi-square values for each probe set were then compared 
to the values generated from the 1000 permutations to determine how many times a 
chi-square value for a probe set in a randomly labeled group was greater than that 
obtained for the true class distinction. A p value was calculated as the number of 
times the chi-square value exceeded the true value in the 1000 permutations. 

The discriminating genes selected were then used in supervised learning 
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algorithms to build classifiers that could identify the specific genetic subgroup. 
Algorithms used included k-Nearest Neighbors (k-NN), Support Vector Machine 
(SVM), and an artificial neural network (ANN). See, Example 1, Witten and Frank 
(1999) Data mining: Practical machine learning tools and techniques with Java 
implementation. Morgan Kaufman; Piatt (1998) Fast training of support vector 
machines using sequential minimal optimization in Advances in kernel methods - 
support vector learning Schlkopf B, Burges C, and Smola A eds. MIT Press; and 
Cover and Hart (1967) IEEE Transactions on Information Theory 13:21-27. 
Performance of each model was initially assessed by three-fold cross validation on a 
randomly selected stratified training set. True error rates of the best performing 
classifiers were then determined using the remaining one-fourth of the samples as a 
blinded test group. Class assignment required that a sample's calculated node value 
exceed a statistically determined confidence level in order for it to be assigned to a 
class. Details of the supervised learning algorithms and their use are described below. 

Detailed Experimental Procedures 

A. Patient Dataset 

132 cases of pediatric ALL were selected from the original 327 diagnostic 
bone marrow aspirates described in Example 1 to reanalyze on the higher density 
U133A and B microarrays. The selection of cases was based on having sufficient 
numbers of each subtype to build accurate class predictions, rather than reflecting the 
actual frequency of these groups in the pediatric population. 

B . Hybridization of microarrays 

The hybridization solutions according to Example 1 were thawed at 45°C, then 
microcentrifuged for 5 minutes to remove any insoluble material from the mixture. 
The hybridization solutions were added to U133A chips and allowed to hybridize for 
16 hours at 45°C. At the end of the incubation period, the hybridization solution was 
removed from each U133A chip and refrozen. Subsequently, the hybridizations were 
thawed and hybridized to the U133B chip. 

A non-stringent wash buffer (6X SSPE, 0.01% Tween 20) was added to each 
chip cassette after the hybridization solution was removed and the cassette allowed to 
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equilibrate to room temperature. The microarray cassettes were then placed on the 
fluidics station and the antibody amplification protocol performed. The arrays were 
washed at 25°C with the non-stringent buffer followed by a more stringent wash at 
50°C with 100 mM MES, 0.1M NaCl 2 , 0.01% Tween 20. The arrays were then 
5 stained with Streptavidin Phycoerythrin (SAPE, Molecular Probes, Eugene, OR) for 
10 minutes at 25°C. Following another non-stringent wash, the arrays were 
hybridized for 10 minutes at 25°C with an antibody solution (100 mM MES, 1 M 
[Na + ], 0.05% Tween 20, 2 mg/ml BSA, 0.1 mg/ml goat IgG, and 3 Dg/ml biotinylated 
antibody). This solution was removed and the cassettes restained with the SAPE 
10 solution. 

Arrays were scanned on a laser confocal scanner (Agilent, Palo Alto, CA) and 
then analyzed with Affymetrix® Microarray Suite 5.0 (MAS 5.0). Detection values 
(present, marginal or absent) were determined by default parameters, and signal 
values were scaled by global methods to a target value of 500. After completing the 
15 scans, the arrays were visually inspected for defects and Affymetrix internal controls 
were utilized to monitor the success of hybridization, washing, and staining 
procedures. 

C. Statistical methods 

20 The chi-square metric and the kNN and ANN supervised learning algorithms 

were performed as described for Example 1. The SVM supervised learning algorithm 
that was used in this study is available as part of the software package Rv 1 .6.0. See, 
Ribeiro, and Brown. Tlie ISBA Bulletin, 8(1):12-16, and www.r-project.org. 

To determine the performance of each model using ANN, a confidence 

25 threshold was built for each diagnostic subtype utilizing a modification of the method 
described by Khan et al. (2001) Nat. Med. 7:673-679. Models were built based on a 
decision tree format where each level of the decision tree contains only two possible 
distinctions - class and non-class (for example, T verses non-T). At each level using 
only samples in the training set, 3 ANN models were built by 3-fold cross validation. 

30 The training set samples were then shuffled and 3 additional ANN models were built. 
This model building process was repeated for a total of 100 times at each step of the 
decision tree. Then an empirical probability distribution for the ANN output node 
value was built only for subtype under study, for example, T-ALL at the first step of 
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the decision tree. Only nodal values greater than 0.5 for each subtype were included. 
For each individual sample in the training set, the 100 validation subtype node values 
were averaged and compared to threshold. Individual samples were assigned to the 
subtype under study only when its average subtype nodal value was greater than the 
5 95% confidence threshold. For samples in 'the test set, subtype nodal values are 
averaged from all models generated in the 3-fold cross validation. A sample is 
assigned to the class under study when the average subtype nodal value is greater than 
the 95% confidence level defined on the training set. A sample not assigned to the 
subtype will progress to the next level of the decision tree, where the entire process is 
10 repeate 



All publications and patent applications mentioned in the specification are 
indicative of the level of those skilled in the art to which this invention pertains. All 
1 5 publications and patent applications are herein incorporated by reference to the same 
extent as if each individual publication or patent application was specifically and 
individually indicated to be incorporated by reference. 

Although the foregoing invention has been described in some detail by way of 
20 illustration and example for purposes of clarity of understanding, it will be obvious 
that certain changes and modifications may be practiced within the scope of the 
appended claims. 
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THAT WHICH IS CLAIMED: 

1 . A method of assigning a subject affected by leukemia to a leukemia 
risk group, said method comprising: 

5 a) providing a subj ect expression profile of a sample from said 

subject affected by leukemia; 

b) providing a plurality of reference expression profiles, each 
associated with a leukemia risk group selected from the group consisting of T-ALL, 
E2A-PBX1, TEL-AML1, BCR-ABL, MIX, Hyperdiploid >50, and Novel, wherein 

10 the subject expression profile and each reference expression profile comprise one or 
more values representing the expression level of a gene having differential expression 
in at least one leukemia risk group; and 

c) selecting the reference expression profile most similar to the 
subject expression profile to thereby assign said subject affected by leukemia to a 

15 leukemia risk group. 

2. The method of claim 1 wherein the subject expression profile and the 
reference expression profile associated with the T-ALL risk group comprise values 
selected from the group consisting of: 

20 a) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 7; 

b) a value representing the expression level of the gene shown in 

Table 14; 

c) values representing the expression levels of at least 20 genes 
25 selected from the genes shown in Table 21; 

d) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 28; 

e) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 35; 

30 f) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 59; and 

g) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 67. 
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3 . The method of claim 1 wherein the subj ect expression profile and the 
reference expression profile associated with the E2A-PBX1 risk group comprise 
values selected from the group consisting of: 
5 a) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 3; 

b) a value representing the expression level of the gene shown in 

Table 10; 

c) values representing the expression levels of at least 20 genes 
1 0 selected from the genes shown in Table 17; 

d) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 24; 

e) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 31; 

15 f) • values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 55; 

g) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 64; and 

h) values representing the expression levels of at least one of the 
20 genes shown in Table 71 . 

4. The method of claim 1 wherein the subject expression profile and the 
reference expression profile associated with the TEL-AML1 risk group comprise 
values selected from the group consisting of: 
25 a) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 8; 

b) values representing the expression levels of the genes shown in 

Table 15; 

c) values representing the expression levels of at least 20 genes 
3 0 selected from the genes shown in Table 22; 

d) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 29; 
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e) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 36; 

f) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 55 ; 

5 g) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 68; and 

h) values representing the expression levels of at least one of the 
genes shown in Table 74. 

10 5 . The method of claim 1 wherein the subj ect expression profile and the 

reference expression profile associated with the BCR-ABL risk group comprise 

values selected from the group consisting of: 

a) values representing the expression level of at least 20 genes 

selected from the genes shown in Table 2; 
15 b) values representing the expression levels of the genes shown in 

Table 9; 

c) values representing the expression level of at least 20 genes 
selected from the genes shown in Table 16; 

d) values representing the expression levels of at least 20 genes 
20 selected from the genes shown in Table 23; 

e) values representing the expression levels of at least 20 gene 
selected from the genes shown in Table 30; 

f) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 54; 

25 g) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 63; and 

h) values representing the expression levels of at least one of the 
genes shown in Table 70. 



30 



6. The method of claim 1 wherein the subject expression profile and the 
reference expression profile associated with the MLL risk group comprise values 
selected from the group consisting of: 
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a) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 5; 

b) values representing the expression levels of the genes shown in 

Table 12; 

c) values representing the expression level of at least 20 genes 
selected from the genes shown in Table 19; 

d) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 26; 

e) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 33; 

f) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 57; 

g) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 66; and 

h) values representing the expression levels of at least one of the 
genes shown in Table 73. 

7. The method of claim 1 wherein the subject expression profile and the 
reference expression profile associated with the Hyperdiploid >50 risk group 
comprise values selected from the group consisting of: 

a) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 4; 

b) values representing the expression levels of the genes shown in 

Table 11; 

c) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 18; 

d) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 25; 

e) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 32; 

f) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 56; 
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g) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 65; and 

h) values representing the expression levels of at least one of the 
genes shown in Table 72. 

8. The method of claim 1 wherein the subject expression profile and the 
reference expression profile associated with the Novel risk group comprise values 
selected from the group consisting of: 



selected from the genes shown in Table 58. 

9. The method of claim 1, wherein said sample from said subject affected 
by ATT, comprises leukemic blasts. 

10. The method of claim 9, wherein said sample from said subject affected 
by ALL comprises at least 35 % leukemic blasts. 

11. The method of claim 10, wherein said sample from said subject 
affected by ALL comprises at least 75% leukemic blasts. 

12. The method of claim 9 wherein said sample comprises leukemic blasts 
derived from peripheral blood. 



a) values representing the expression level of at least 20 genes 
selected from the genes shown in Table 6; 

b) values representing the expression level of the genes shown in 

Table 13; 

c) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 20; 

d) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 27; 

e) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 34; and 

f) values representing the expression levels of at least 20 genes 
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13. The method of claim 9 wherein said sample comprises blast cells 
derived from bone marrow. 



14. A method of predicting whether a subject affected by leukemia has an 
increased risk of relapse, said method comprising the steps of: 

a) assigning the subj ect affected by leukemia to a leukemia risk 
group selected from the group consisting of T-ALL, Hyperdiploid >50, TEL-AML1, 
MLL, E2A-PBX1, BCR-ABL, and Novel; 

b) providing a subject expression profile of a sample from said 
subject affected by leukemia; 

c) providing a reference expression profile associated with the 
occurrence of relapse in the leukemia risk group to which the subject affected by 
leukemia is assigned, wherein the subject expression profile and the reference 
expression profile comprise one or more values representing the expression level of a 
gene having differential expression in subjects affected by leukemia who will relapse 
after conventional therapy; and 

d) determining whether the subject expression profile shares 
sufficient similarity to the reference expression profile associated with relapse in the 
leukemia risk group to which the subject affected by leukemia is assigned to thereby 
determine whether the subject affected by leukemia has an increased risk of relapse. 

15. The method of claim 14, wherein the step of assigning the subject 
affected by leukemia to a leukemia risk group is performed according to the method 
of claim 1. 

16. The method of claim 14, wherein said subject affected by leukemia is 
assigned to the T-ALL risk group and said subject expression profile and said 
reference expression profile comprise values representing the expression levels of at 
least 8 genes selected from the genes shown in Table 44. 

17. The method of claim 14, wherein said subj ect affected by leukemia is 
assigned to the Hyperdiploid >50 risk group and said subject expression profile and 
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said reference expression profile comprise values representing the expression levels of 
at least 5 genes selected from the genes shown in Table 45. 

18. The method of claim 14, wherein said subj ect affected by leukemia is 
assigned to the TEL-AML1 risk group and said subject expression profile and said 
reference expression profile comprise values representing the expression levels of at 
least 3 genes selected from the genes shown in Table 46. 

1 9. The method of claim 14, wherein said subject affected by leukemia is 
assigned to the MLL risk group and said subject expression profile and said reference 
expression profile comprise values representing the expression levels of at least 5 
genes selected from the genes shown in Table 47. 

20. The method of claim 14, wherein said subject affected by leukemia is 
not assigned to the T-ALL, Hyperdiploid>50, TEL-AML1, MLL, E2A-PBX1, or 
BCR-ABL risk group and said subject expression profile and said reference 
expression profile comprise values representing the expression levels of at least 4 
genes selected from the genes shown in Table 48. 

21 . A method of predicting whether a subj ect affected by TEL- AML1 has 
an increased risk of developing secondary AML, said method comprising: 



subject affected by TEL-AML1; 

b) providing a reference expression profile associated with the 
occurrence of secondary AML in subjects affected by TEL-AML1 wherein the 



subjects affected by TEL- AML 1 who will develop secondary AML; and 

c) determining whether the subj ect expression profile shares 
sufficient similarity to the reference expression profile associated with the occurrence 
of secondary AML to thereby determine whether the subject affected by TEL-AML1 
has an increased risk of developing secondary AML. 



a) providing a subject expression profile of a sample from said 



subject expression profile and the reference expression profile comprise one or more 
values representing the expression level of a gene having differential expression in 
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22. A method of choosing a therapy for a subject affected by leukemia, 
said method comprising: 

a) providing a subject expression profile of a sample from said 
subject affected by leukemia; 
5 b) providing a plurality of reference expression profiles, each 

associated with a leukemia risk group selected from the group consisting of T-ALL, 
E2A-PBX1, TEL-AML1, BCR-ABL, MLL, Hyperdiploid >50, and Novel, wherein 
the subject expression profile and each reference expression profile comprise one or 
more values representing the expression of level of a gene having differential 
10 expression in at least one leukemia risk group; and 

c) selecting the reference expression profile most similar to the 
subject expression profile to thereby choose a therapy for the subject affected by 
leukemia. 

15 23 . A method of choosing a therapy for a subj ect affected by leukemia, 

said method comprising the steps of: 

a) assigning the subject affected by leukemia to a leukemia risk 

group selected from the group consisting of T-ALL, Hyperdiploid >50, TEL-AML1, 

MLL, E2A-PBX1, BCR-ABL, and Novel; 
20 b) providing a subj ect expression profile of a sample from said 

subject affected by ALL; 

c) providing a reference expression profile associated with the 
occurrence of relapse in the leukemia risk group to which the subject affected by 

leukemia is assigned, wherein the subject expression profile and the reference 
25 expression profile comprise one or more values representing the expression level of a 
gene having differential expression in subjects who will relapse after conventional 
therapy; and 

d) determining whether the subj ect expression profile shares 
sufficient similarity to the reference expression profile associated with relapse in the 

30 leukemia risk group to which the subject affected by ALL is assigned to thereby chose 
a therapy for said subject affected by ALL. 
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24. The method of claim 23, wherein the step of assigning the subject 
affected by leukemia to a leukemia risk group is performed according to the method 
of claim 1. 

5 25 . The method of claim 23, wherein said subject affected by leukemia is 

assigned to the T-ALL risk group and said subject expression profile and said 
reference expression profile comprise values representing the expression levels of at 
least 8 genes selected from the genes shown in Table 44. 

10 26. The method of claim 23, wherein said subject affected by leukemia is 

assigned to the Hyperdiploid >50 risk group and said subject expression profile and 
said reference expression profile comprise values representing the expression levels of 
at least 5 genes selected from the genes shown in Table 45. 

15 27. The method of claim 23, wherein said subject affected by leukemia is 

assigned to the TEL-AML1 risk group and said subject expression profile and said 
reference expression profile comprise values representing the expression levels of at 
least 3 genes selected from the genes shown in Table 46. 

20 28. The method of claim 23, wherein said subject affected by leukemia is 

assigned to the MLL risk group and said subject expression profile and said reference 
expression profile comprise values representing the expression levels of at least 5 
genes selected from the genes shown in Table 47. 



25 29. The method of claim 23, wherein said subject affected by leukemia is 

not assigned to the T-ALL, hyperdiploid >50, TEL-AML1, MLL, E2A-PBX1, or 
BCR-ABL risk group and said subject expression profile and said reference 
expression profile comprise values representing the expression levels of at least 4 
genes selected from the genes shown in Table 48. 

30 

30. A method of choosing a therapy for a subject affected by TEL-AML1 , 
said method comprising: 
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a) providing a subject expression profile of a sample from said 
subject affected by TEL-AML1 ; 

b) providing a reference expression profile associated with the 
occurrence of secondary AML in subjects affected by TEL-AML1 wherein the 

5 subject expression profile and the reference expression profile comprise one or more 
values representing the expression level of a gene having differential expression in 
subjects affected by TEL- AML 1 who will develop secondary AML; and 

c) determining, whether the subj ect expression profile shares 
sufficient similarity to the reference expression profile associated with the occurrence 

10 of secondary AML to thereby chose a therapy for the subject affected by TEL-AML1 . 

3 1 . The method of claim 30, wherein said subject expression profile and 
said reference expression profile comprise values representing the expression levels of 
at least 7 genes selected from the genes shown in Table 48. 

15 

32. A method to aid in the determination of a prognosis for a subject 
affected by leukemia, said method comprising: 

a) providing a subject expression profile of a sample from said 
subject affected by leukemia; 

20 b) providing a plurality of reference expression profiles, each 

associated with a leukemia risk group selected from the group consisting of T- ALL, 
E2A-PBX1, TEL-AML1, BCR-ABL, MLL, Hyperdiploid >50, and Novel, wherein 
the subject expression profile and each reference expression profile comprise one or 
more values representing the expression of level of a gene having differential 

25 expression in at least one leukemia risk group; and 

c) selecting the reference expression profile most similar to the 
subject expression profile to thereby determine the prognosis for the subject affected 
by leukemia. 

30 33. A method to aid in the detennination of the prognosis for a subject 

affected by leukemia, said method comprising the steps of: 
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a) assigning the subject affected by leukemia to a leukemia risk 
group selected from the group consisting of T- ALL, Hyperdiploid >50, TEL-AML1, 
MLL, E2A-PBX1, BCR-ABL, or Novel risk group; 

b) providing a subject expression profile of a sample from said 
5 subject affected by leukemia; 

c) providing a reference expression profile associated with the 
occurrence of relapse in the leukemia risk group to which the subject affected by 
leukemia is assigned, wherein the subject expression profile and the reference 
expression profile comprise one or more values representing the expression level of a 

10 gene having differential expression in subjects who will relapse after conventional 
therapy ; and 

d) determining whether the subject expression profile shares 
sufficient similarity to the reference expression profile associated with relapse in the 
Leukemia risk group to which the subject affected by leukemia is assigned to thereby 

1 5 determine the prognosis for the subject affected by leukemia. 

34. A method to aid in the determination of the prognosis for a subject 
affected by TEL-AML1, said method comprising: 

a) providing a subject expression profile of a sample from said 
20 subject affected by TEL-AML 1 ; 

b) providing a reference expression profile associated with the 
occurrence of secondary AML in subjects affected by TEL-AML 1 wherein the 
subject expression profile and the reference expression profile comprise one or more 
values representing the expression level of a gene having differential expression in 

25 subjects affected by TEL-AML 1 who will develop secondary AML after conventional 
therapy; and 

c) determining whether the subj ect expression profile shares 
sufficient similarity to the reference expression profile associated with the occurrence 
of secondary AML to thereby determine the prognosis for the subject affected by 

30 TEL-AML 1. 
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35 . A method of assigning a subject affected by ALL to an ALL risk group 
selected from the group consisting of T-ALL, E2A-PBX1, TEL-AML1, BCR-ABL, 
MLL, Hyperdiploid >50, and Novel, said method comprising: 

a) providing a subj ect expression profile of a sample from said 
5 affected by ALL; 

b) providing a reference expression profile associated with the T- 
ALL risk group wherein the subject expression profile and the reference expression 
profile comprises one or more values representing the expression level of a gene 
having differential expression in the T-ALL risk group; 

10 c) determining whether the subject expression profile shares 

statistically significant similarity to the reference expression profile associated with 
the T-ALL risk group to thereby determine whether the subject affected by ALL is in 
the T-ALL risk group; 



d) if the subj ect affected by ALL is not in the T-ALL risk group, 
providing a reference expression profile associated with the E2A-PBX1 risk group 
wherein the subject expression profile and the reference expression profile comprises 
one or more values representing the expression level of a gene having differential 
expression in the E2A-PBX1 risk group; 

e) determining whether the subject expression profile shares 
statistically significant similarity to the reference expression profile associated with 
the E2A-PBX1 risk group to thereby determine whether the subject affected by ALL 
is in the E2A-PBX1 risk group; 

f) if the subject affected by ALL is not in the E2A-PBX risk 
group, providing a reference expression profile associated with the TEL-AML1 risk 
group wherein the subject expression profile and each reference expression profile 
comprises one ore more valued representing the expression level of a gene having 
differential expression in the TEL-AML1 risk group; 

g) determining whether the subj ect expression profile shares 
statistically significant similarity to the reference expression profile associated with 
the TEL-AML1 risk group to thereby detennine whether the subject affected by ALL 
is in the TEL-AML1 risk group; 



-189- 



WO 03/083140 



h) if the subject affected by ALL is not in the Tel-AMLl risk 
group, providing a reference expression profile associated with the BCR-ABL risk 
group wherein the subject expression profile and each reference expression profile 
comprises one or more values representing the expression level of a gene having 
differential expression in the BCR-ABL risk group; 

i) determining whether the subject expression profile shares 
statistically significant similarity to the reference expression profile associated with 
the BCR-ABL risk group to thereby determine whether the subject affected by ALL is 
in the BCR-ABL risk group; 

j) if the subject affected by ALL is not in the BCR-ABL risk 
group, providing a reference expression profile associated with the MLL risk group 
wherein the subject expression profile and each reference expression profile 
comprises one or more values representing the expression level of a gene having 
differential expression in the MLL risk group; 

k) determining whether the subj ect expression profile shares 
statistically significant similarity to the reference expression profile associated with 
the MLL risk group to thereby determine whether the subject affected by ALL is in 
the MLL risk group; 

1) if the subject affected by ALL is not in the MLL risk group, 
providing a reference expression profile associated with the Hyperdiploid >50 risk 
group wherein the subject expression profile and each reference expression profile 
comprises one or more values representing the expression level of a gene having 
differential expression in the Hyperdiploid >50 risk group; 

m) determining whether the subject expression profile shares 
statistically significant similarity to the reference expression profile associated with 
the Hyperdiploid 50 risk group to thereby determine whether the subject affected by 
ALL is in the Hyperdiploid >50 risk group; 

n) if the subject affected by ALL is not in the Hyperdiploid >50 
risk group, providing a reference expression profile associated with the Novel risk 
group wherein the subject expression profile and each reference expression profile 
comprises one or more values representing the expression level of a gene having 
differential expression in the Novel risk group; and 
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o) determining whether the subject expression profile shares 



statistically significant similarity to the reference expression profile associated with 
the Novel risk group to thereby determine whether the subject affected by ALL is in 
the Novel risk group. 

36. An array for use in a method of assigining a subject affected by 
leukemia to a leukemia risk group comprising a substrate having a plurality of 
addresses, wherein each address has disposed thereon a capture probe that can 
specifically bind a nucleic acid molecule selected from the group consisting of: 

a) a nucleic acid molecule that is differentially expressed in at 
least one leukemia risk group selected from the group consisting of T- ALL, E2A- 
PBX1, TEL-AML1, BCR-ABL, MLL, Hyperdiploid >50, and Novel; 

b) a nucleic acid molecule that is differentially expressed in 
subjects affected by leukemia who will relapse after conventional therapy; and 

c) a nucleic acid molecule that is differentially expressed in 
subjects affected by leukemia who will develop secondary AML after conventional 
therapy. 

37. The array of claim 36, wherein each nucleic acid molecule that is 
differentially expressed in at least one leukemia risk group is selected from the group 
consisting of the genes shown in Tables 2-36, 63-68, and 70-74. 

38. The array of claim 36, wherein each nucleic acid molecule that is 
differentially expressed in subjects affected by leukemia who will relapse after 
conventional therapy is selected from the group consisting of the genes shown in 
Tables 44-48. 

39. The array of claim 36, wherein each nucleic acid molecule that is 
differentially expressed in subjects affected by leukemia who will develop secondary 
AML after conventional therapy is selected from the group consisting of the genes 
shown in Table 52. 
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40. The array of claim 36, wherein the substrate has greater than 20 
addresses. 



4 1 . The array of claim 40, wherein the substrate has greater than 40 



42 . The array of claim 4 1 , wherein the substrate has greater than 68 



10 43. The array of claim 36, wherein the substrate has no more than 500 



44. A kit for assigning a subject affected by ALL to a leukemia risk group, 
said kit comprising: 

a) an array comprising a substrate having a plurality of addresses, 
wherein each address has disposed thereon a capture probe that can specifically bind a 
nucleic acid molecule that is differentially expressed in at least one leukemia risk 
group selected from the group consisting of T-ALL, E2A-PBX1, TEL-AML1, BCR- 
ABL, MLL, Hyperdiploid >50, and Novel; and 

b) a computer-readable medium having a plurality of digitally- 
encoded expression profiles wherein each profile of the plurality has a plurality of 
values, each value representing the expression of a nucleic acid molecule detected by 
the array. 



25 45 . A kit for assigning a subj ect affected by ALL to a leukemia risk group, 

said kit comprising: 

a) an array according to claim 37; and 

b) a computer-readable medium having a plurality of digitally- 
encoded expression profiles wherein each profile of the plurality has a plurality of 

30 values, each value representing the expression of a nucleic acid molecule detected by 
the array. 



-192- 



WO 03/083140 



IP' if 1 I J b O !' PCT/US03/08486 



46. A kit for predicting whether a subj ect affected by leukemia has an 
increased risk of relapse, said kit comprising: 

a) an array comprising a substrate having a plurality of addresses, 
wherein each address has disposed thereon a capture probe that can specifically bind a 

5 nucleic acid molecule that is differentially expressed in subjects affected by leukemia 
who will relapse following conventional therapy; and 

b) a computer-readable medium having a plurality of digitally- 
encoded expression profiles wherein each profile of the plurality has a plurality of 
values, each value representing the expression of a nucleic acid molecule detected by 

10 the array. 



47. A kit for predicting whether a subj ect affected by leukemia has an 
increased risk of relapse, said kit comprising: 

a) an array accrding to claim 3 8 ; and 
15 b) a computer-readable medium having a plurality of digitally- 

encoded expression profiles wherein each profile of the plurality has a 
plurality of values, each value representing the expression of a nucleic acid 
molecule detected by the array. 



48. A kit for predicting whether a subject affected by TEL-AML1 has an 
increased risk of relapse, said kit comprising: 

a) an array comprising a substrate having a plurality of addresses, 
wherein each address has disposed thereon a capture probe that can specifically bind a 
nucleic acid molecule that is differentially expressed in subjects affected by TEL- 
AML1 who will relapse after conventional therapy; and 

b) a computer-readable medium having a plurality of digitally- 
encoded expression profiles wherein each profile of the plurality has a plurality of 
values, each value representing the expression of a nucleic acid molecule detected by 
the array. 

49. A kit for predicting whether a subject affected by TEL-AML1 has an 
increased risk of relapse, said kit comprising: 

a) an array according to claim 39; and 
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b) a computer-readable medium having a plurality of digitally- 
encoded expression profiles wherein each profile of the plurality has a 
plurality of values, each value representing the expression of a nucleic acid 
molecule detected by the array. 

5 

50. A kit to aid in choosing therapy for a subject affected by leukemia, said 
kit comprising: 

a) an array comprising a substrate having a plurality of addresses, 
wherein each address has disposed thereon a capture probe that can specifically bind a 

10 nucleic acid molecule that is differentially expressed in at least one leukemia risk 

group selected from the group consisting of T-ALL, E2A-PBX1, TEL-AML1, BCR- 
ABL, MLL, Hyperdiploid >50, and Novel; and 

b) a computer-readable medium having a plurality of digitally- 
encoded expression profiles wherein each profile of the plurality has a plurality of 

1 5 values, each value representing the expression of a nucleic acid molecule detected by 
the array. 

51. A kit to aid in choosing therapy for a subject affected by leukemia, said 
kit comprising: 

20 a) an array according to claim 37; and 

b) a computer-readable medium having a plurality of digitally- 
encoded expression profiles wherein each profile of the plurality has a 
plurality of values, each value representing the expression of a nucleic acid 
molecule detected by the array. 

25 

52. A computer-readable medium comprising a plurality of digitally- 
encoded expression profiles wherein each profile of the plurality has a plurality of 
values, each value representing the expression of a gene that is differentially 
expressed in at least one leukemia risk group selected from the group consisting of T- 

30 ALL, E2A-PBX1, TEL-AML1, BCR-ABL, MLL, Hyperdiploid >50, and Novel. 



53. The computer readable medium of claim 52, wherein the expression 
profiles comprise values selected from the group consisting of: 
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a) values representing the expression levels of at least 7 genes 
selected from the genes show in Tables 2-8, 16-36, 54-60, and 63-68; 

b) a value representing the expression level of the gene shown in 

Table 10; 

5 c) a value representing the expression level of the gene shown in 

Table 14; 

d) values representing the expression levels of the genes shown in 
Tables 9, 11, 12, 13, and 15; and 

e) values representing the expression level of at least one gene 
10 showin in Tables 70, 71, 72, 73, and 74. 

54. A computer-readable medium comprising a plurality of digitally- 
encoded expression profiles wherein each profile of the plurality has a plurality of 
values, each value representing the expression of a gene that is differentially 

15 expressed in subjects affected by leukemia who will relapse following conventional 
therapy. 

55 . The computer readable medium of claim 54, wherein the expression 
profiles comprise values selected from the group consisting of: 

20 a) values representing the expression levels at least 8 genes 

selected from the genes show in Table 44. 

b) values representing the expression levels of at least 5 genes 
selected from the genes shown in Table 45; 

c) values representing the expression levels of at least 3 genes 
25 selected from the genes shown in Table 46; 

d) values representing the expression levels of at least 5 genes 
selected from the genes shown in Table 47; and 

e) values representing the expression levels of at least 4 genes 
selected from the genes shown in Table 48. 

30 

56. A computer-readable medium comprising a plurality of digitally- 
encoded expression profiles wherein each profile of the plurality has a plurality of 
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values, each value representing the expression of a gene that is differentially 
expressed in subjects affected by leukemia who will develop secondary AML. 



57. The computer readable medium of claim 56, wherein the expression 
5 profiles comprise values selected from values representing the expression levels of at 
least 7 genes selected from the genes show in Table 52. 



58. The method of claim 1 wherein the subject expression profile and the 
reference expression profile associated with the T-ALL risk group comprise values 

10 selected from the group consisting of: 

a) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 7; 

b) a value representing the expression level of the gene shown in 

Table 14; 

15 c) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 21; 

d) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 28; 

e) values representing the expression levels of at least 20 genes 
20 selected from the genes shown in Table 35; and 

f) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 59. 

59. The method of claim 1 wherein the subject expression profile and the 
25 reference expression profile associated with the E2A-PBX1 risk group comprise 

values selected from the group consisting of: 

a) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 3; 

b) a value representing the expression level of the gene shown in 

30 Table 10; 

c) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 17; 
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d) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 24; 

e) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 31; 

5 f) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 55; 

g) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 64; and 

h) values representing the expression levels of at least one of the 
1 0 genes shown in Table 7 1 . 



60. The method of claim 1 wherein the subject expression profile and the 
reference expression profile associated with the TEL-AML1 risk group comprise 
values selected from the group consisting of: 

15 a) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 8; 

b) values representing the expression levels of the genes shown in 

Table 15; 

c) values representing the expression levels of at least 20 genes 
20 selected from the genes shown in Table 22; 

d) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 29; 

e) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 36; and 

25 f) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 55. 

6 1 . The method of claim 1 wherein the subj ect expression profile and the 
reference expression profile associated with the BCR-ABL risk group comprise 

30 values selected from the group consisting of: 

a) values representing the expression level of at least 20 genes 
selected from the genes shown in Table 2; 
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b) values representing the expression levels of the genes shown in 

Table 9; 

c) values representing the expression level of at least 20 genes 
selected from the genes shown in Table 16; 

5 d) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 23; 

e) values representing the expression levels of at least 20 gene 
selected from the genes shown in Table 30; and 

f) values representing the expression levels of at least 20 genes 
10 selected from the genes shown in Table 54. 

62. The method of claim 1 wherein the subject expression profile and the 
reference expression profile associated with the MLL risk group comprise values 
selected from the group consisting of: 

1 5 a) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 5; 

b) values representing the expression levels of the genes shown in 

Table 12; 

c) values representing the expression level of at least 20 genes 
20 selected from the genes shown in Table 19; 

d) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 26; 

e) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 33; and 

25 f) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 57. 

63. The method of claim 1 wherein the subject expression profile and the 
reference expression profile associated with the Hyperdiploid >50 risk group 

30 comprise values selected from the group consisting of: 

a) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 4; 
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b) values representing the expression levels of the genes shown in 

Table 11; 

c) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 18; 

5 d) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 25; 

e) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 32; and 

f) values representing the expression levels of at least 20 genes 
10 selected from the genes shown in Table 56. 

64. The array of claim 36, wherein each nucleic acid molecule that is 
differentially expressed in at least one leukemia risk group is selected from the group 
consisting of the genes shown in Tables 2-36. 

15 
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